Main Menu

Research & Integration


WP2: Evaluation, Integration and Standards
WP3: Visual Content Indexing
WP4: Content Description for Audio, Speech and Text
WP5: Multimodal Processing and Interaction
WP6: Machine Learning and Computation Applied to Multimedia
WP7: Dissemination towards Industry

Logo IST
Home arrow Dissemination arrow Showcases arrow Articulatory talking head...
Articulatory talking head...

Articulatory talking head driven by automatic speech recognition

Leader: Jacques Feldmar, INRIA PAROLE 

  • KTH
  • NRIA-Parole

Objective of the showcase project: The objective of the showcase is to recreate in real-time the articulatory movements of a speaker with an articulatory talking head, using the speech signal only. This articulatory talking head aims at giving a realistic display of the movements of speech articulators that are normally hidden, such as the tongue.

Two important applications are targeted:

  • Computer-assisted second language learning, and
  • pronunciation training for hearing-impaired children.

Two engineers were hired, one at INRIA, one at KTH, full time for a five months period. INRIA and KTH had a two days meeting in Stockholm at the beginning of February for definition of the communication protocol between INRIA speech recognition and KTH talking head.  Conference calls are held weekly or bi-monthly depending on need.

A version of the communication protocol has been implemented and tested. It is TCP-IP based and guaranties synchronization between the two processes.

INRIA has integrated Gaussian Mixture Models (GMM) into its Voice Activation Detection (VAD). Quality of speech, non-speech has been improved but still needs to be improved. New measurements will be introduced such as autocorrelation and zero-crossing ratio.

KTH has integrated some articulators to its talking head such as the tongue. Different display schemes are being tested. The graphic user interface still has to be developed.

A first version of the articulatory talking head driven by speech recognition is planned to be shown in Budapest on April 23. It will have no GUI, VAD will be insufficient, articulators will not be all displayed and visualization will require improvements. Also, both speech recognition and articulation will not be adapted to English yet. However, we hope that it will be enough to show the concept and demonstrated that our team will deliver on time and specifications

A final presentation of the "Articulatory Talking Head" Showcase can be found here . The presentation covers a brief explanation of the showcase and some results. The videos that are used in the results part of the presentation can be found here as; Articulation (wmv), Prosody1 (wmv), Prosody2 (wav), Prosody3 (wav), ScenarioTalkingHeadCorrect (wmv), ScenarioTalkingHeadStudent (wmv), ScenarioCorrect (avi), ScenarioStudent (avi), ScenarioFrench (avi).