Submission

Slack

  1. Please Register here.
  2. Evaluation data and development set ground truth release dates are in Important Dates.
    • πŸ† You can use in the meantime the leaderboard to score dev-set results. See Leaderboard section.
    • For evaluation, ensure that your system is able to perform inference on an amount of data of ~30 hours in two weeks.
      • We can offer some flexibility for unexpected problems but try to be on time ! (e.g. keep the system light and fast enough).
  3. At the challenge end, along with system predictions you will need also to submit a technical description paper of max 4 pages + references.
  4. Few days after challenge end, we will release results for both tracks in Results page.
    • Ground truth for evaluation will be also released, so you can perform ablations.
  5. Every participant is invited to present their work at the CHiME-8 Interspeech Satellite Workshop.
    • πŸ₯‡ The jury award for the most efficient and innovative system will be announced there !
  6. After the Workshop participants are invited to submit a full-length paper (3 to 6 pages, including references) which will appear in the Workshop proceedings and will be indexed.
    • More info on publications is in the Main page.

πŸ₯· If you want to still submit the final system but want to keep the final submission (and required technical description !) anonymous please email samuele.cornell@ieee.org directly.

πŸ† Interactive Leaderboard (Dev Set)

Leaderboard is now online and available at: https://huggingface.co/spaces/NOTSOFAR/CHiME8Challenge.

πŸ‘‰ NOTE: dev set ground truth is available for all scenarios except NOTSOFAR1. For this latter you need to use the leaderboard.
For the others, since you have the ground-truth, you can check the performance also by yourself using our scoring script.
For NOTSOFAR1 our suggestion is that you use for now a training partition as validation as the development set and training set have 10 out of 13 speaker-id in common. See NOTSOFAR1 data description.

Leaderboard fo Dev set.
Interactive Leaderboard for the development set.

How to submit to the leaderboard ?

  1. You need first to create an Huggingface Hub πŸ€— account: https://huggingface.co/docs/huggingface_hub/quick-start.
    1. Secondly, you need to create an access token: https://huggingface.co/docs/hub/security-tokens

Then you are ready to submit !
For each submission you will be asked:

  • Team Name: The name of your team, as it will appear on the leaderboard’
  • Results: Results zip file to submit
  • Submission track: The track to submit results to
  • Token: Your Hugging Face token
  • Description: Short description of your submission (optional)

The results should be a zip file of a folder called β€œdev” with the following structure:

β”œβ”€β”€ dev
    β”‚Β Β  β”œβ”€β”€ chime6.json
    β”‚Β Β  β”œβ”€β”€ dipco.json
    β”‚Β Β  β”œβ”€β”€ mixer6.json
    β”‚Β Β  └── notsofar1.json

Where each JSON file is a JSON SegLST already described in Data Page (and same as past CHiME-7 DASR challenge), it is also described in the bottom of this page.
We provide also an example submission for the baseline as it may be helpful: dev.zip.

πŸ“Š Ranking Score

In both tracks we will use scenario-wise macro time-constrained minimum permutation word error rate (tcpWER) with 5 s collar to rank the systems. tcpWER is firstly computed on each scenario evaluation set separately and then averaged across the 4 scenarios.

πŸ“ We apply text normalization on both participants systems output and the ground truth before computing this score.
You can have a look at the normalized ground truth for each core dataset, it is in the transcriptions_scoring folder.
This year we use Whisper-style text normalization. However, it is modified to be idempotent and less β€œaggressive” (closer to the original text).

You can compute the ranking score e.g. on development set using chime-utils:
chime-utils score tcpwer -s /path/to/submission/folder -r /path/to/c8dasr_root --dset-part dev --ignore

--ignore flag is needed if you lack some ground truth (e.g. NOTSOFAR1 dev ground truth will not be released till the last month of the challenge).
The scoring will skip that scenario. You have to use the leaderboard for that one.

πŸ“¬ Where to Submit for Final Evaluation ?

When evaluation data is released you will find a Google Form here where you can submit your final results. It will contain a brief questionnaire.

πŸ“ What do I need to submit for final evaluation ?

Participants have to submit the predictions of TWO systems for each track they choose to participate and a technical description paper.
This means you can submit up to 4 total systems for each of the two tracks.
πŸ‘‰ If you participate only in the constrained LM track only, in that case your submission will also be valid for the unconstrained LM track.

πŸ‘« We will ask for predictions for both evaluation and development sets.

πŸ’‘ Our suggestion (it is not mandatory), is that you submit two systems:

  • one system for the best performance (e.g. use ensembles and so on).
  • a second system where you try to win the jury prize with more careful balance between runtime/efficiency and performance.

System Info YAML File

We will also kindly ask in the Submission Google Form to also include for EACH of their TWO final submission this YAML file containing information on EACH system characteristics and training/inference resources.

System Predictions Submission Format

We expect each submission to each track to have the following structure:

.
β”œβ”€β”€ sys1
β”‚Β Β  β”œβ”€β”€ dev
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ chime6.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ dipco.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ mixer6.json
β”‚Β Β  β”‚Β Β  └── notsofar1.json
β”‚Β Β  β”œβ”€β”€ eval
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ chime6.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ dipco.json
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ mixer6.json
β”‚Β Β  β”‚Β Β  └── notsofar1.json
β”‚Β Β  └── info.yaml
└── sys2
    β”œβ”€β”€ dev
    β”‚Β Β  β”œβ”€β”€ chime6.json
    β”‚Β Β  β”œβ”€β”€ dipco.json
    β”‚Β Β  β”œβ”€β”€ mixer6.json
    β”‚Β Β  └── notsofar1.json
    β”œβ”€β”€ eval
    β”‚Β Β  β”œβ”€β”€ chime6.json
    β”‚Β Β  β”œβ”€β”€ dipco.json
    β”‚Β Β  β”œβ”€β”€ mixer6.json
    β”‚Β Β  └── notsofar1.json
    └── info.yaml

Thus, two sub-folders, one for each system containing JSON files with the predictions and the system information YAML file.

Each JSON file is a CHiME-6 style Segment-wise Long-form Speech Transcription (SegLST), as already described in Data Page (and same as past CHiME-7 DASR challenge).
I.e. each JSON contains predictions for each scenario and contains a list of dictionaries (one for each utterance), each with following attributes:

{
  "end_time": "11.370",
  "start_time": "11.000",
  "words": "So, um [noise]",
  "speaker": "P03",
  "session_id": "S05"
}

πŸ“‘ Technical Description Paper

Participants have also to submit ONE (even if team submitted to both tracks) short technical description paper (2 to 4 pages, + references) containing a description of the system.
Submission will be made through Conference Management Toolkit (CMT), and a link will be added in this page at the evaluation stage.

This technical description paper will be used:

  1. To evaluate for jury prize award for efficiency and novelty (also the YAML files will be used).
  2. To assess correctness of submission and compliance with rules.

Thus, you should try to include relevant details regarding runtime efficiency and novelty as much as possible such as:

  • inference time, real-time factor, how many GPUs used in inference and their type and so on…
  • training data details, external dataset used, data augmentation…
  • pre-trained models used.

πŸ“© Contact

For questions or help, you can reach the organizers via CHiME Google Group or via CHiME Slack Workspace.