Synthesis and expressive transformation of singing voice

Luc Ardaillon phD thesis
UPMC / IRCAM - UMR STMS (IRCAM - CNRS - Sorbonne Universités), Paris, France
contact : luc.ardaillon@ircam.fr

This page is a demo page presenting sound examples, links to listening tests and other additional material for my phD thesis on "Synthesis and expressive transformation of singing voice".


Chapter 3: ISiS: a concatenative singing synthesizer

Examples sounds from databases:


RT - / _ k O e~ k y l p e _ /: sound 3.1
RT - / _ O v n i _ /: sound 3.2
MS - / _ O v n i _ /: sound 3.3
MS - / _ m j O z O t i s _ /: sound 3.4
EL - / _ m j O z O t i s _ / : sound 3.5
EL - / _ p a R k i N l a p e~ _ /: sound 3.6


Synthesis engines:


Exemple of copy synthesis with the SVP engine and RT database: sound 3.7
Exemple of copy synthesis with the SVP engine and MS database: sound 3.8
Exemple of copy synthesis with the SVP engine and EL database: sound 3.9
Exemple of copy synthesis with the PaN engine and RT database: sound 3.10
Exemple of copy synthesis with the PaN engine and MS database: sound 3.11
Exemple of copy synthesis with the PaN engine and EL database: sound 3.12



Chapter 4: Control module: modelization of the synthesis parameters


Exemple of upward release f0 segment: sound 4.1


Examples of transitions with various parameters from figure 4.9:

a) sound 4.2
b) sound 4.3
c) sound 4.4
d) sound 4.5
e) sound 4.6
f) sound 4.7


Evaluation of f0 model ([Ardaillon2015]):


Link to original listening test with full instructions for evaluation of the proposed f0 model (test not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/Test2015luc/

Link to demo page with all sounds used in this test: http://recherche.ircam.fr/anasyn/ardaillon/ardaillon2015f0model/



Chapter 5: Modeling singing styles: towards a more expressive synthesis


Example of timbre-related feature implied in singing style perception:


Copy synthesis for an extract of "Carmen" (opera from Bizet) using the RT database recorded with a "French variety" type of timbre:
sound 5.1

Same extract synthesized with a database from the same singer (RT) but with a lyrical type of timbre in the database:
sound 5.2


Songs from our singing style corpus:


Edith Piaf:
Les feuilles mortes: https://www.youtube.com/watch?v=n2s2tPORlW4
La foule: https://www.youtube.com/watch?v=Fgn8gZHJZzA
Hymne à l'amour: https://www.youtube.com/watch?v=QvHph2zrMrA

Juliette Greco:
Les feuilles mortes: https://www.youtube.com/watch?v=dLevW__7Y8Q
La javanaise: https://www.youtube.com/watch?v=zk26wHJCbP4
Je hais les dimanches: https://www.youtube.com/watch?v=74eQqUhHMJc

François Le Roux:
Les feuilles mortes: https://www.youtube.com/watch?v=WOMjrCW9Lcs
Dernier voeu: https://www.youtube.com/watch?v=ic9hL2czIZg
Sous l'épais sycomore: https://www.youtube.com/watch?v=ZEfYZtzaXcM

Sacha Distel:
Les feuilles mortes: http://www.deezer.com/fr/track/88391885
Parlez-moi d'amour: https://www.youtube.com/watch?v=23153OP5KY8
Que reste-t-il de nos amours?: https://www.youtube.com/watch?v=bq-jj6kaCVI


Harmonic analysis:


Example of resynthesis of the song from the harmonic analysis based on the f0 analysis for the chorus of "Les feuilles mortes" by Edith Piaf:
sound 5.3

This example aims to give an idea of the quality of the f0 estimation obtained for the commercial polyphonic recordings of the corpus, and of the harmonic analysis from which the loudness is estimated.


f0 parameters estimation:


Comparaison of f0 curve analyzed on original recording with the curve obtained from the proposed f0 model after estimation the model parameters on the original curve, for the chorus of "Les feuilles mortes" from Edith Piaf. The curve are resynthesized with a single sinusoid, allowing to the perceptual difference between the original and the synthetic curve.
- original f0 curve:sound 5.4
- f0 curve generated from model:sound 5.5


Evaluations:


1st evaluation ([Ardaillon2016]):
Link to original listening test for the 1st evaluation on singing styles modeling, with original instructions (test not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/IS2016/listTest/
Link to demo page with all sounds used in this test: http://recherche.ircam.fr/anasyn/ardaillon/IS2016/listTest/demo.php

2nd evaluation:
Link to original listening test for the 2nd evaluation on singing styles modeling, with original instructions (test not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/singingStyles2017/
Link to demo page with all sounds used in this test: http://recherche.ircam.fr/anasyn/ardaillon/singingStyles2017/demo.php



Chapter 6: Expressive timbre transformations

Morphing-based transformations ([Degottex2016a],[Degottex2016b]):


Sounds examples of pitch scaling and intensity timbre transformation based on spectral morphing with MFA envelope analysis on sustained vowels: http://gillesdegottex.eu/Demos/DegottexG2016mfaenvsing/

Example of morphing-based intensity transformation for a singing extract synthesized with our ISiS synthesizer (using SVP engine):
sound 6.1


Glottal source transformation with intensity:


Link to original listening test on glottal source (Rd and Ee LF parameters) modification with intensity (not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/testIntensitySrc2016/testCresc-en.php
Demo page with the sounds used in this test: http://recherche.ircam.fr/anasyn/ardaillon/testIntensitySrc2016/demo.php


Loudness correction:


Synthesized vowels with similar target loudness levels:
- without applying vowel-dependent correction factors (unexpected loudness differences are perceived between vowels):sound 6.2
- with application of vowel-dependent correction factors (perceived loudness is more homogeneous than without correction):sound 6.3


Mouth opening effect ([Ardaillon2017]):


Link to original listening test for evaluation of perception of mouth opening effect applied on sustained vowels (test not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/test_OM_transform_2017/OM_transform_CMOS_test-en.php
Link to original listening test for evaluation of sound quality of mouth opening effect applied on sustained vowels (test not active anymore): http://recherche.ircam.fr/anasyn/ardaillon/test_OM_transform_2017/OM_transform_quality_MOS_test-en.php
Link to demo page with all sounds used in these 2 tests: http://recherche.ircam.fr/anasyn/ardaillon/mouthOpening2017/demo.php


Roughness transformations:


1st Amplitude modulation-based approach:

Sounds from figure 6.19:

- a). Real "clean" voice:sound 6.4
- b). Amplitude-modulated voice:sound 6.5
- c). Sub-harmonics (isolated from signal by subtraction):sound 6.6
- d). High-pass-filtered sub-harmonics:sound 6.7
- e). Final mix (original sound + filtered sub-harmonics):sound 6.8


Example from figure 6.20 (with more sub-harmonics):
- a). Real "clean" voice:sound 6.4
- b). Amplitude-modulated voice:sound 6.9
- c). Sub-harmonics (isolated from signal by subtraction):sound 6.10
- d). High-pass-filtered sub-harmonics:sound 6.11
- e). Final mix (original sound + filtered sub-harmonics):sound 6.12


Additional example:
- a). Real "growled" (rough) voice:sound 6.13
- b). Real "clean" voice:sound 6.14
- c). Simulated "Growled" (rough) voice using amplitude-modulation-based approach on "clean" voice:sound 6.15



2nd approach based on jitter and shimmer generation in the PaN synthesis engine:

Male voice:

- Real rough voice (from figure 6.17):sound 6.16
- Previous sound resynthesized with PaN without jitter and shimmer (figure 6.26):sound 6.17
- Rough sound resynthesized with PaN with original jitter and shimmer patterns (figure 6.27):sound 6.18


Female voice:
- Real rough voice:sound 6.19
- Previous sound resynthesized with PaN without jitter and shimmer:sound 6.20
- Rough sound resynthesized with PaN with decreased jitter (factor 0.5 on original jitter patterns. No shimmer.):sound 6.21
- Rough sound resynthesized with PaN with increased jitter (factor 2 on original jitter patterns. No shimmer.):sound 6.22


Transposition of jitter patterns onto another voice:
- Real rough voice:sound 6.19
- Real clean voice:sound 6.23
- Original clean voice resynthesized with addition of jitter and shimmer patterns extracted from original rough voice:sound 6.24




Chapter 7: Conclusion


Artistic applications:


Extract of synthesized voice for the opera I.D. by Arnaud Petit (with his kind authorization):
sound 7.1
More informations of the piece


Sounds submitted to the singing synthesis challenge at the Interspeech 2016 conference


- "Les feuilles d'Interspeech" (autumn leaves) - RT - PaN engine - a capella version:sound 7.2
- "Les feuilles d'Interspeech" (autumn leaves) - RT - PaN engine - with musical accompaniment:sound 7.3


Other synthesis examples (used in [Feugere2016] for evaluation purposes):


- "Les feuilles d'Interspeech" (autumn leaves) - MS - SVP engine - a capella version:sound 7.4
- "Au temps d'Interspeech" (summer times) - RT - PaN - a capella version:sound 7.5
- "Au temps d'Interspeech" (summer times) - RT - SVP - a capella version:sound 7.6
- "Au temps d'Interspeech" (summer times) - MS - PaN - a capella version:sound 7.7





Bonus

Research is a long process made of trials and errors. But when it comes to computer sciences and sound, errors may sometimes generate the most interesting and fun results.
To illustrate this statement, you will find below a compilation of my "best failures", most of which result from parametrization errors or bugs in our ISiS synthesis system that occured while developing it.
But those bugs have hopefully all been fixed by now, such that I would unfortunately not be able to reproduce such sounds again (which makes them even more valuable!).
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

All the sounds in this page are under Copyright © 2017 IRCAM, Institut de recherche et coordination acoustique/musique.