Registration for the NOTSOFAR-1 Challenge is currently available through this link. Please register only once per team.

The purpose of registration is to collect basic participant details (name, email, etc.) to enable us to share targeted announcements and updates about the NOTSOFAR challenge, including notifications about new data releases and important milestones. Your information will be deleted after the challenge concludes, unless you give us explicit permission to retain it. While registration is not mandatory and is not limited by any deadline, we highly encourage you to register to stay informed and easily track changes related to the challenge. You are welcome to participate and submit results for the NOTSOFAR challenge even without registering.

Submission Format

We align with CHiME’s SegLST (Segment-wise Long-form Speech Transcription) style.

A submission consists of a json hypothesis file, tcp_wer_hyp.json, containing a list of dictionaries (one for each utterance), each with the following attributes:

"session_id": "multichannel/MTG_30860_plaza_0",
"start_time": 28.890
"end_time": 30.719
"words": "we were the only car that can fly",
"speaker": "Spk2"

Submissions will be evaluated using the calc_wer function found in from the baseline repository. You can use it to compute your own WER on datasets with ground-truth available.
Two metrics will be calculated: the official tcpWER for ranking, and the supplementary tcORC WER (see Challenge Tracks).

Optional: tcORC WER expects a stream ID to be provided as the “speaker” attribute. Participants have the option to provide an additional tc_orc_wer_hyp.json file with their own stream definition (for instance, our baseline treats CSS channels as streams, see write_hypothesis_jsons function). If this file is not provided, the default behavior is to use tcp_wer_hyp.json for both, thereby treating the predicted speaker IDs as the stream IDs.

Leaderboard - Dev Set Evaluation

Navigate to the “Submission” tab on the Hugging Face Leaderboard, select a track (NOTSOFAR-MC or NOTSOFAR-SC), and submit a zip containing either both tcp_wer_hyp.json and tc_orc_wer_hyp.json files or just the tcp_wer_hyp.json. Ensure your hypothesis files cover all sessions within the dev set precisely.

Final Evaluation on the Blind Eval Set

Participants are required to submit up to two system predictions for each track they choose to participate in, and a technical description paper. We will also kindly ask, for each of the two final system submissions, to include a YAML file containing information about training/inference resources and runtime.

Stay tuned for further details.

📑 Technical Description Paper

Participants are also required to submit one short technical description paper (2 to 6 pages, + references) containing a description of the system(s).
Submission will be made through Conference Management Toolkit (CMT), and a link will be added in this page at the evaluation stage.

This technical description paper will be used:

  1. To evaluate which systems to highlight for efficiency and novelty, together with the provided YAML files.
  2. To assess correctness of submission and compliance with rules.

Therefore, participants are encouraged to include relevant details regarding runtime efficiency and novelty, such as:

  • Inference time, real-time factor, how many GPUs used in inference and their type etc.
  • Training data details, external datasets used, data augmentation.
  • Pre-trained models used.