Main Menu

Research & Integration


WP2: Evaluation, Integration and Standards
WP3: Visual Content Indexing
WP4: Content Description for Audio, Speech and Text
WP5: Multimodal Processing and Interaction
WP6: Machine Learning and Computation Applied to Multimedia
WP7: Dissemination towards Industry

Logo IST
Home arrow News arrow Latest news arrow Real-Time Audio-Visual ...
Real-Time Audio-Visual ...

Real-Time Audio-Visual Automatic Speech Recognition Demonstrator

Leader: Alexanderos Potamianos, TSI-TUC

  • TSI-TUC (Alexandros Potamianos, Manolis Perakakis, Eduardo Sanchez-Soto, Fanis Kanetis)
  • ICCS-NTUA (Petros Maragos, George Papandreou, Nassos Katsamanis )
  • INRIA-IRISA (Patrick Gros, Guillaume Gravier)

Objective of the showcase project: One of the most promising approaches to improve the performance and extend the applicability of Automatic Speech Recognition (ASR) systems is to integrate visual information into the recognition process. Towards practically deployable AV-ASR, we build a proof-of-concept laptop-based AV-ASR prototype which: (i) uses consumer microphone and camera to capture the speaker; (ii) performs visual/audio feature extraction, as well as speech recognition on the laptop in real-time; (iii) is robust to failures of a single modality, such as visual occlusion of the speaker's face; and (iv) automatically adapts to changing acoustic noise levels. 

Completed Work: Audio/Video capture, Face Tracking, Audio/Visual Front-End

Ongoing Work: Back-End, Fusion Module, Graphical User Interface, Integration

        Video of the Real-Time Audio-Visual Automatic Speech Recognition Demonstrator Showcase