Objectives

WP4: Content Description for Audio, Speech and Text

The aim of this work-package is to prepare numerical multimedia data to knowledge extraction and multimodality semantics association. Tasks will focus on each single modality of text, audio (music, sound, speech), as well as combinations thereof to extract semantic concepts.

As most of the approaches in this domain depend to a high degree on various machine learning techniques, there will be tight cooperation with WP5 on Learning and Computation. Furthermore, while the focus is on acoustic and textual modalities in this WP, cooperation with WP3 on Content Description for Image and Video and WP6 on Cross-Modal Integration for Multimedia Content will ensure a more solid extraction of semantic descriptors. Wherever possible, methods shall be evaluated on benchmark data sets available via WP2 on Evaluation, Integration and Standards.

Concerning audio processing, we aim to:

develop reliable techniques for indexing audio;
evaluate the performance of audio feature sets for different tasks such as genre analysis, sound classification, audio stream segmentation and audio retrieval;
participate in respective evaluation campaigns such as MIREX

Concerning speech processing, we aim to

address the problem of robustness of automatic speech recognition systems
evaluate, in how far specific speech analysis can contribute to extracting and describing the semantics of audio data in general.

Concerning text and natural language processing, we aim to:

study the specific problems of natural language processing for describing multimedia content (descriptive vocabulary, ontology)
analyze textual aspects of multimedia such as scripts, lyrics, etc. accompanying multimedia data
advance such techniques.
take part in world-wide competitions such as TREC or CLEF

Main Menu

Workpackages

WP4: Content Description for Audio, Speech and Text

Popular

Main Menu

Workpackages

WP4: Content Description for Audio, Speech and Text