Submission

Updates:

[June 17, 2023] Submission is now closed
[June 2, 2023] Added final instructions regarding the submission.

What evaluation data is provided?

Systems will be ranked based on their performance on the following evaluation data:

CHiME-5 evaluation set

The CHiME-5 evaluation set consists of short single-channel audio segments extracted from the CHiME-5 binaural recordings. Each segment contains noisy speech with up to three simultaneously-active speakers. Only the single-speaker segments of the eval/1 subset will be used for the first evaluation stage using the DNS-MOS objective performance metrics. The audio samples of the eval/listening_test subset will be used for the second evaluation stage (listening test).
Reverberant LibriCHiME-5 evaluation set

The reverberant LibriCHiME-5 evaluation set contains single-channel noisy speech mixtures with up to three simultaneously-active speakers. Each mixture is labeled with the clean speech reference signal. This dataset will only be used for the first evaluation stage.

For additional information please refer to the Data and Rules sections.

What do participants need to submit?

Please make sure to follow the instructions below carefully.

Audio files

Participants must submit the audio signals produced at the output of their speech enhancement system for the CHiME-5 evaluation set (eval/1 and eval/listening_test subsets only) and for the reverberant LibriCHiME-5 evaluation set (eval/1, eval/2, and eval/3 subsets). The UDASE task only focuses on single-microphone noise suppression, without addressing speech separation or dereverberation. The output signal thus corresponds to single-microphone and potentially multi-speaker speech with suppressed/attenuated background noise.
The output signals should have the same number of samples as the noisy speech input signals, and they should be submitted as 16-bit PCM WAV files with a 16 kHz sampling rate.
The loudness of the submitted audio signals for the eval/1 and eval/listening_test subsets of the CHiME-5 dataset should be normalized to -30 LUFS. Please visit the baseline github repository (see the Baseline section) for instructions regarding how to perform this loudness normalization.

CSV files

The participants are kindly asked to compute the objective performance metrics and submit two CSV files:
- The first CSV file should contain the DNS-MOS scores for each example of the eval/1 subset of the CHiME-5 evaluation set (3013 examples in total). It should be formatted as follows:
  
  subset input_file_name output_file_name SIG_MOS BAK_MOS OVR_MOS
  
  eval/1 <mix ID>.wav <mix ID>_output.wav
  
  ⋮ ⋮ ⋮
  
  Participants are asked to normalize their signals to -30 LUFS before computing the DNS-MOS performance scores (see the baseline github repository for instructions regarding how to perform this loudness normalization). The motivation for this normalization is that DNS-MOS (especially the SIG and BAK scores) is very sensitive to a change of the input signal loudness. This sensitivity to the overall signal loudness would make it difficult to compare different systems without a common normalization procedure.
- The second CSV file should contain the SI-SDR scores for each example of the reverberant LibriCHiME-5 evaluation set (1952 examples in total). It should be formatted as follows:
  
  subset input_file_name output_file_name SI-SDR
  
  eval/1, eval/2 or eval/3 <mix ID>_mix.wav <mix ID>_output.wav
  
  ⋮ ⋮ ⋮
Instructions and tools to compute the DNS-MOS and the SI-SDR metrics and to generate the CSV files are available in the baseline github repository (see the Baseline section). The provided numbers will be verified by the organizers by scoring the submitted audio signals using the scoring functions in the baseline github repository. It is therefore important that participants verify their evaluation is reproducible using the provided tools.
Optionally, participants may also submit a third CSV file containing the SI-SDR scores for each example of the following subsets of the LibriMix dataset:
- Libri2Mix/wav16k/max/test/mix_single (3000 single-speaker examples);
- Libri2Mix/wav16k/max/test/mix_both (3000 2-speaker examples);
- Libri3Mix/wav16k/max/test/mix_both (3000 3-speaker examples).
SI-SDR results on LibriMix will not be used to rank systems because it would not be consistent with the purpose of the UDASE task. They will only be used to compare the performance on the (close to) in-domain and out-of-domain datasets.

subset	input_file_name	output_file_name	SIG_MOS	BAK_MOS	OVR_MOS
`eval/1`	`<mix ID>.wav`	`<mix ID>_output.wav`
⋮	⋮	⋮

subset	input_file_name	output_file_name	SI-SDR
`eval/1`, `eval/2` or `eval/3`	`<mix ID>_mix.wav`	`<mix ID>_output.wav`
⋮	⋮	⋮

Technical report

Each submission should include a technical report in the form of a two-page extended abstract (references can extend onto a third page). This technical report needs to be sufficiently complete for the organizers to judge whether the submitted system complies with the task rules. It should include:
- an abstract;
- an introduction;
- a description of the methodology and the experimental setup (including a description of the system, model architecture and external/pre-existing tools/softwares that might have been used);
- a presentation and discussion of the results obtained following the official task rules, and potentially additional results;
- a conclusion and references.
In addition to the final results obtained with the submitted system, participants are asked to provide in the technical report any information that could help understanding/justifying the choices made during the development of this system. This may for instance take the form of an ablation study. For systems that perform unsupervised adaptation from a fully-supervised model, it is expected to show that the final model after unsupervised adaptation on the unlabeled CHiME-5 data obtains better performance than the same model trained only on the labeled LibriMix data.
See here for additional information (including a LaTeX author kit) and for submitting the technical report. In particular, please read carefully the “CHiME-7 challenge papers” section.

Naming and packaging of the submission

Please make sure to follow the instructions below carefully.

The output audio signals and the CSV files should be placed in a directory with the following structure:

  .
  ├── audio
  │   ├── CHiME-5
  │   │   └── eval
  │   │       ├── 1
  │   │       │   ├── <mix ID>_output.wav
  │   │       │   ├── ...
  │   │       └── listening_test
  │   │           ├── <mix ID>_output.wav
  │   │           ├── ...
  │   └── reverberant-LibriCHiME-5
  │       └── eval
  │           ├── 1
  │           │   ├── <mix ID>_output.wav
  │           │   ├── ...
  │           ├── 2
  │           │   ├── <mix ID>_output.wav
  │           │   ├── ...
  │           └── 3
  │               ├── <mix ID>_output.wav
  │               ├── ...
  └── csv
     ├── CHiME-5
     │   └── results.csv
     ├── LibriMix
     │   └── results.csv
     └── reverberant-LibriCHiME-5
         └── results.csv

where

audio/CHiME-5/eval/1 contains 3013 files;
audio/CHiME-5/eval/listening_test contains 241 files;
audio/reverberant-LibriCHiME-5/eval/1 contains 1394 files;
audio/reverberant-LibriCHiME-5/eval/2 contains 494 files;
audio/reverberant-LibriCHiME-5/eval/3 contains 64 files.

Providing the csv/LibriMix/results.csv file in the submitted directory is optional.
Output signals should be named using the following conventions:
- For each noisy speech input signal named <mix ID>.wav in the CHiME-5 evaluation set, the corresponding output signal should be named <mix ID>_output.wav. For example, the output file associated to an input file with relative path eval/1/<mix ID>.wav in the CHiME-5 eval set should be placed at audio/CHiME-5/eval/1/<mix ID>_output.wav in the submitted directory
- For each noisy speech input signal named <mix ID>_mix.wav in the reverberant LibriCHiME-5 evaluation set, the corresponding output signal should be named <mix ID>_output.wav. For instance, the output file associated to an input file with relative path eval/1/<mix ID>_mix.wav in the reverberant LibriCHiME-5 eval set should be placed at audio/reverberant-LibriCHiME-5/eval/1/<mix ID>_output.wav in the submitted directory.
The directory containing the output audio signals and the CSV files (. in the above directory tree structure) should be packaged using zip or tar or any standard packaging tool.
An example submission zip file (including audio and csv files) is available here.
Before submission, please verify that your submission directory is valid using the check_submission.py script available in the baseline github repository (see the Baseline section). Your submission should be valid if you do not receive any error message when running python check_submission.py path/to/the/submission/directory.

How to submit?

Please first submit your technical report (extended abstract) following the instructions here. You will obtain a CMT paper ID, which is required to fill out the submission form below.
The submission has to be done using this Google form. You will be asked to provide:
- general information about the submission (corresponding authors, submission name, team profile, CMT paper ID, etc.);
- general information about the submitted system;
- the average performance scores of the submitted system;
- a link to download the packed directory containing the submitted audio signals and CSV files.
Considering the provided evaluation dataset, the size of the packed directory should be about 700 Mo. Participants can use any service such as Google Drive, WeTransfer or similar to upload their results.

Subsmission deadline

See Important dates