CHiME-9 Task 1 - MCoRec

Pre-announcement - Launching in July

Multi-Modal Context-aware Recognition

Recognizing speech in environments where there is noise, babble and strong overlap of multiple conversations, is generally referred to as the cocktail-party challenge. We define a database and research challenge to advance the state-of-the-art for understanding multiple concurrent conversations with 2 or more participants under such adverse conditions.

The Challenge

This task aims to address the challenge of correctly generating transcriptions from video and audio in a scenario where multiple talkers converse at the same time. Humans perform this remarkable task with ease in situations like restaurants, bars, meeting rooms. They do this by focusing on one speaker selectively using all available modalities and sources of information, including the acoustic signal, binaural directional hearing, visual face/lip tracking, semantic flow of a conversation, speaker characteristics and more. Machines still fall short under these conditions. In this challenge, we aim to improve the state-of-the-art of understanding of multiple concurrent conversations of 2 or more conversants each with the goal of matching or surpassing human performance by leveraging all available input modalities to focus on and transcribe speech from each conversation, simulating human attention in a crowded source environment. It tackles the problem of aligning multi-modal data (e.g., visual and audio) to extract relevant text that is consistent and pertinent to a selected face and speaker. The key research question is: How can we effectively leverage multi-modal inputs to improve transcription accuracy by isolating individual conversations using specific factors such as speaker identity, semantics, or visual features?

Evaluation

Participants will be evaluated based on Word Error Rate (WER) and Diarization Error Rate (DER) for the relevant multi-party conversation. Further details of the data and the task design will be announced at the end of March. The challenge with the full data, baseline and metrics will be launched on July 1st.

Organizers

  • Alexander Waibel (CMU, USA & KIT, DE),
  • Christian Fuegen (Meta, UK),
  • Shinji Watanabe (CMU, USA)
  • Katerina Zmolikova (Meta, UK),
  • Thai-Binh Nguyen (KIT, DE),
  • Pingchuan Ma (Meta, USA)