Main Menu

Home
Coordination
Research & Integration
Dissemination
Community
News
Events
Links

Workpackages

WP2: Evaluation, Integration and Standards
WP3: Visual Content Indexing
WP4: Content Description for Audio, Speech and Text
WP5: Multimodal Processing and Interaction
WP6: Machine Learning and Computation Applied to Multimedia
WP7: Dissemination towards Industry


Logo IST
Home arrow Research & Integration arrow Overview arrow Real-Time Audio-Visual ...
Real-Time Audio-Visual ...

Real-Time Audio-Visual Automatic Speech Recognition Demonstrator


Leader: Alexanderos Potamianos, TSI-TUC
Partners:

  • TSI-TUC (Alexandros Potamianos, Manolis Perakakis, Eduardo Sanchez-Soto, Fanis Kanetis)
  • ICCS-NTUA (Petros Maragos, George Papandreou, Nassos Katsamanis )
  • INRIA-IRISA (Patrick Gros, Guillaume Gravier)

Objective of the showcase project: One of the most promising approaches to improve the performance and extend the applicability of Automatic Speech Recognition (ASR) systems is to integrate visual information into the recognition process. Towards practically deployable AV-ASR, we build a proof-of-concept laptop-based AV-ASR prototype which: (i) uses consumer microphone and camera to capture the speaker; (ii) performs visual/audio feature extraction, as well as speech recognition on the laptop in real-time; (iii) is robust to failures of a single modality, such as visual occlusion of the speaker's face; and (iv) automatically adapts to changing acoustic noise levels. 

Completed Work: Audio/Video capture, Face Tracking, Audio/Visual Front-End

Ongoing Work: Back-End, Fusion Module, Graphical User Interface, Integration


        Video of the Real-Time Audio-Visual Automatic Speech Recognition Demonstrator Showcase