Main Menu

Research & Integration


WP2: Evaluation, Integration and Standards
WP3: Visual Content Indexing
WP4: Content Description for Audio, Speech and Text
WP5: Multimodal Processing and Interaction
WP6: Machine Learning and Computation Applied to Multimedia
WP7: Dissemination towards Industry

Logo IST

WP4: Content Description for Audio, Speech and Text 

The aim of this work-package is to prepare numerical multimedia data to knowledge extraction and multimodality semantics association. Tasks will focus on each single modality of text, audio (music, sound, speech), as well as combinations thereof to extract semantic concepts.

As most of the approaches in this domain depend to a high degree on various machine learning techniques, there will be tight cooperation with WP5 on Learning and Computation. Furthermore, while the focus is on acoustic and textual modalities in this WP, cooperation with WP3 on Content Description for Image and Video and WP6 on Cross-Modal Integration for Multimedia Content will ensure a more solid extraction of semantic descriptors. Wherever possible, methods shall be evaluated on benchmark data sets available via WP2 on Evaluation, Integration and Standards.

Concerning audio processing, we aim to:
  1. develop reliable techniques for indexing audio;
  2. evaluate the performance of audio feature sets for different tasks such as genre analysis, sound classification, audio stream segmentation and audio retrieval;
  3. participate in respective evaluation campaigns such as MIREX

Concerning speech processing, we aim to
  1. address the problem of robustness of automatic speech recognition systems
  2. evaluate, in how far specific speech analysis can contribute to extracting and describing the semantics of audio data in general.

Concerning text and natural language processing, we aim to:
  1. study the specific problems of natural language processing for describing multimedia content (descriptive vocabulary, ontology)
  2. analyze textual aspects of multimedia such as scripts, lyrics, etc. accompanying multimedia data
  3. advance such techniques.
  4. take part in world-wide competitions such as TREC or CLEF