CHiME-5

Warning: The CHiME-5 dataset has been superseded by the newer CHiME-6 release which fixes audio alignment issues that existed in the original data. We recommend that you use CHiME-6 instead of CHiME-5 for any new work.

As of 1st Jan 2024, the CHiME-5 dataset has been re-issued under a standard CC BY-SA 4.0 license, and is free for both academic and commercial purposes.

The data can be downloaded from myairbridge.com

At the site, you will find the following six gzipped tar files:

CHiME5_train.tar.gz [91 GB] - the training set audio
CHiME5_dev.tar.gz [10 GB] - the development set audio
CHiME5_eval.tar.gz [13 GB] - the evaluation set audio
CHiME5_transcriptions.tar.gz [10 MB] - complete set of transcriptions
CHiME5_floorplans.tar.gz [1.4 MB] - approximate floorplans for the 20 living spaces.

The MyAirBridge interface allows you to select the components you wish to download.

Note, for publications that use the data please cite the following paper:

Barker, J., Watanabe, S., Vincent, E., Trmal, J. (2018) The Fifth ‘CHiME’ Speech Separation and Recognition Challenge: Dataset, Task and Baselines. Proc. Interspeech 2018, 1561-1565, doi: 10.21437/Interspeech.2018-1768

You can use the follow bib entry:

@inproceedings{barker18_interspeech,
  author={Jon Barker and Shinji Watanabe and Emmanuel Vincent and Jan Trmal},
  title={{The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines}},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1561--1565},
  doi={10.21437/Interspeech.2018-1768}
}