Main Menu

Research & Integration


WP2: Evaluation, Integration and Standards
WP3: Visual Content Indexing
WP4: Content Description for Audio, Speech and Text
WP5: Multimodal Processing and Interaction
WP6: Machine Learning and Computation Applied to Multimedia
WP7: Dissemination towards Industry

Logo IST

Automatic Character (in Audiovisual Document) Indexing (ACADI)

Leader: Julien Pinquier, UPS-IRIT  

  • Frederick Gianni, Julien Pinquier (UPS-IRIT) and
  • Ewa Kijak (INRIA-Texmex)

Description: We propose a system which permits to describe and structure audiovisual documents without training, nor corpus knowledge, and to visualize with an interface the principal interventions. Our goal is to fuse three segmentation systems: face, costume and speaker detectors to obtain the best association between voices and appearing persons in an audio/video sequences. We propose an interface as a tool used in a verification-aided fashion of the segmentation result.

Current-state of the art of the showcase:
During this period, we have done:

  • the selection of the experiment corpus,
  • the adaptation of each tool (face, speaker and costume detections) in an independent way,
  • the choice of a XML exchange format,
  • a first interface which permits to visualize video segments,
  • two phone meetings (2 february and 30 march).

The first step in the building of our interface is to be able to parse results from the segmentation systems. As we used XML file format for data exchange this has been straight forward. The application built up a sequence object, from the XML file results, containing the segmentation of the sequence from the three detectors (see figure 1). Using audio/video decoders we can retrieve images from the video to illustrate the segmentation and also to play the video segments.

The interface developed to visualize those results already provide the primary requirements:

  • open and parse XML file results,
  • display the images of the detected characters, fetch back from the video file,
  • display appearing and speaking statistics for a selected character,
  • display the segmentation results for all the characters,
  • merge multiple characters in one to overcome the multiple labeling of one character,
  • play video segments from the segmented sequence.

VIdeo of the Automatic Character (in Audiovisual Document) Indexing (ACADI)