FAQ

Can I use more data for training?

Yes. You can use each utterance of the official training set in as many versions as needed (clean, noisy, enhanced...). You are even allowed to modify the acoustic simulation baseline provided that you mix each speech signal with the same noise signal as in the original simulated set (i.e., only the impulse responses can change, not the noise instance).

Should systems report a separate results for each microphone array channel?

No, a single result should be reported that combines evidence from all six microphone array channels.

In the baseline system only Channel 5 is used for training and development. Can participants use data from other channels to train/tune their systems?

Yes, this is allowed.

Can the close-talk microphone be used for training?

Yes, it can be used for training and development but not for testing. Note that this channel won't be made available in the final test set.

Can the close-talk microphone be used for evaluation?

No. Note that this channel won't be made available in the final test set.

Can you provide an annotation file that labels microphone failures?

No, the baseline speech enhancement software includes a basic characterisation of microphone failure by means of the signal power but it is not so simple. Finding out how to best characterise microphone failure is part of the challenge.

Can we use real data only to tune system parameters?

Yes. You can use the official real development set, the official simulated development set, or both official development sets. The scoring scripts (local/chime3_calc_wers.sh, local/chime3_calc_wers_smbr.sh) use both official development sets to tune the system. You should not use a subset of these sets, however, in order to ensure that parameter tuning is done in a comparable way across participants.

Can I use different tunings for real and simulated data?

No. For every tested system, you should report the performance on the real and simulated data using the same setting. However, you are free to test different system settings optimised on the official real development set, the official simulated development set, or both. Eventually, only the results on the real test set will be taken into account in the final ranking.

Is speaker adaptation for all speaker utterances allowed?

Yes, it is OK to use all isolated utterances from a given speaker. However, when you use the embedded recordings, please do not use more than 5 seconds of context before the test utterance.

What language models can I use?

An official language model has been supplied. However, any form of language model rescoring can be used (e.g., MBR, DLM) as long as the technique is trained using official training data only, i.e. data in CHiME3/data/WSJ0/wsj0/doc/lng_modl/lm_train/.

What should the final submission include: a WER report or a transcript of the test set?

Both. Please submit the WER performance in the paper you will submit to ASRU. Please also submit the ASR transcripts corresponding to your results on the official training and test sets by emailing them to chimechallenge@gmail.com on or before the submission deadline.

For details see here.

Should we use the tool in the Kaldi baseline to calculate the WER?

Yes, please use the Kaldi scoring tool (compute-wer). There are subtle differences between the way WERs are computed by different tools, hence the use of the Kaldi tool is required.

Are we allowed to use development and test utterances for unsupervised training purposes?

Yes, any training technique is OK as long as it uses neither the correct transcripts nor the environment labels of the test data.

Should I send transcripts for both the real and simulated test sets?

Yes. Please send transcripts and report WERs for both sets. However, only the results obtained on the real data will be used for system ranking.

FAQ Frequently Asked Questions last updated 6th July