VaSAB: THE VARIABLE SIZE ADAPTIVE INFORMATION BOTTLENECK FOR DISENTANGLEMENT ON SPEECH AND SINGING VOICE

Frederik Bous, Axel Roebel
UMR9912 STMS | IRCAM - CNRS - Sorbonne Université - Ministère de la Culture | Paris, France













11-213-1

Model Input WORLD Nosp Rasp Hivo Ravo
1600
800
0
-800
-1600

Legend

Input Input file
WORLD Transposition using the WORLD vocoder
Nosp Baseline model using a classical bottleneck based on dimensionality reduction. Here nb = 8. Trained only on speech.
Hivo VaSAB bottleneck using the hierarchical dropout. Trained on both, speech and singing.
Ravo VaSAB bottleneck using random (classical) dropout. Trained on both, speech and singing.
Rasp VaSAB bottleneck using random (classical) dropout. Trained only on speech.

The sounds used on this page are under Copyright © 2021 Ircam, Institut de recherche et coordination acoustique/musique.