FAQ

FAQ Frequently Asked Questions

Can I use more data for training?

Yes. You can use each utterance of the official training set in as many versions as needed (clean, noisy, enhanced...). You are even allowed to modify the acoustic simulation baseline provided that you mix each speech signal with the same noise signal as in the original simulated set (i.e., only the impulse responses can change, not the noise instance).

Should systems report separate results for each microphone array channel?

No, a single result should be reported that combines evidence from all microphones array channels involved in the considered track. Note that these channels may vary from one utterance to another.

In the baseline system only channel 5 is used for training and development. Can participants use data from other channels to train/tune their systems?

Yes, this is allowed for all tracks.

Can the close-talk microphone be used for training?

Yes, it can be used for training and development but not for testing. Note that this channel won't be made available in the final test set.

Can the close-talk microphone be used for evaluation?

No. Note that this channel won't be made available in the final test set.

Can you provide an annotation file that labels microphone failures?

The channels used in the development and test data for the 1ch and 2ch tracks were chosen in such a way that microphone failures do not arise. Such failures do arise, however, in the development and test data for the 6ch track and in the training data for all tracks. Automatically detecting microphone failures is part of the challenge. The enhancement baseline (BeamformIt) implicitly discards channels for which the microphone failed, but participants might find better solutions based on, e.g., measuring the correlation between channels.

Can we use real data only to tune system parameters?

Yes. You can use the official real development set, the official simulated development set, or both official development sets. The scoring scripts (local/chime4_calc_wers.sh, local/chime4_calc_wers_smbr.sh) use both official development sets to tune the system. You should not use a subset of these sets, however, in order to ensure that parameter tuning is done in a comparable way across participants.

Can I use different tunings for real and simulated data?

No. For every tested system, you should report the performance on the real and simulated data using the same setting. However, you are free to test different system settings optimised on the official real development set, the official simulated development set, or both. Eventually, only the results on the real test set will be taken into account in the final ranking.

Is speaker adaptation for all speaker utterances allowed?

Yes, it is OK to use all isolated utterances from a given speaker. However, when you use the embedded recordings, please do not use more than 5 seconds of context before the test utterance.

What language models can I use?

Several official language models (3-gram, 5-gram, RNNLM) have been supplied. However, any form of language model rescoring can be used (e.g., MBR, DLM) as long as the technique is trained using official training data only, i.e. data in CHiME3/data/WSJ0/wsj0/doc/lng_modl/lm_train/.

Can we use 48 kHz data for system training/evaluation?

No, this data are not publicly available.

What should the final submission include: a WER report or a transcript of the test set?

Both. Please submit the WER performance in the paper you will submit to the CHiME-4 Workshop. Please also submit the ASR transcripts corresponding to your results on the official training and test sets by emailing them to chimechallenge@gmail.com on or before the submission deadline. If you are using the Kaldi baseline, the transcripts to be submitted are the files called exp/***/best_wer_${enhan}.result, exp/***/best_wer_${enhan}_5gkn_5k.result, or exp/***/best_wer_${enhan}_rnnlm_5k_h300_w0.5_n100.result.

If you are not using Kaldi, please stick to the transcription format below:

<utterance id> word_1 word_2, …
For example,

F01_050C0103_BUS_SIMU CONSTRUCTION EMPLOYMENT FELL FORTY SEVEN THOUSAND AFTER FIFTEEN THOUSAND JOB DECLINE THE MONTH BEFORE
F01_050C0103_CAF_SIMU CONSTRUCTION EMPLOYMENT FELL FORTY SEVEN THOUSAND AFTER A FIFTEEN THOUSAND JOB DECLINE THE MONTH BEFORE
F01_050C0103_PED_SIMU CONSTRUCTION EMPLOYMENT FELL FORTY SEVEN THOUSAND AFTER FIFTEEN THOUSAND JOB DECLINE THE MONTH BEFORE
:
:

Should we use the tool in the Kaldi baseline to calculate the WER?

Yes, please use the Kaldi scoring tool (compute-wer). There are subtle differences between the way WERs are computed by different tools, hence the use of the Kaldi tool is required.