One year postdoc position, start between October and December 2011 at LORIA (Nancy, France).
The objective is to synthesize speech from dynamic and static articulatory data of the vocal tract.
Several kinds of data are available: X-ray films recorded in the eighties, static MRI data of the vocal tract, EMA data (electromagnetography, i.e. positions of sensors glued onto speech articulators acquired at a frame rate of 200 Hz) and ultrasound images of the tongue registered with respect to the speaker’s head.
We already built an adaptable articulatory model, which describes the 2D geometry of the vocal tract from the larynx to the lips. The area function of the vocal tract can be derived easily from this model. We thus plan to animate this model from EMA or ultrasound data.
The second part of the postdoc deals with the connection between the dynamics of the vocal tract shape provided by the articulatory model and the acoustic simulation. The synthesizer developed by S. Maeda can generate any speech sound, i.e. consonants and vowels, but the difficulty is to pilot the transitions between vowels and consonants relevantly to avoid any spurious artifact.
Medical imaging, signal processing, acoustics, speech processing