Results

The CHiME challenge received 26 submissions. Teams typically employed multiple strategies implemented by improving or replacing components in the processing pipeline of the baseline systems. The table below considers the systems under eight headings: Train - training data or training technique; N-ch - multichannel enhancement; 1-ch - single channel enhancement; Feat - feature extraction; Norm - feature normalisation; AM - acoustic modelling; LM - language modelling and Com - system combination. A tick in the corresponding column indicates that the system component differed with respect to the baseline system.

Results represent WER and are shown separately for the simulated data (SIM) and then for the real data by environment (BUS, CAF, PED and STR) and finally the average (AVE) over the real data environments.

(Click on column headings to re-sort by column.)
Team Train N-ch 1-ch Feat Norm AM LM Com SIM BUS CAF PED STR AVE
NTT, Japan [1] 4.5 7.4 4.5 6.2 5.2 5.8
MERL/SRI [2] 8.6 13.5 7.7 7.1 8.1 9.1
UST China + GIT, US [3] 7.0 13.8 11.4 9.3 7.8 10.6
Inria, France [4] 6.2 16.2 9.6 12.3 7.2 11.3
Fraunhofer + U. Oldenburg [5] 6.4 13.5 13.5 10.6 9.2 11.7
Hitachi R\&D, Japan [6] 9.8 16.6 11.8 10.0 8.8 11.8
NTU, Singapore [7] 8.2 14.5 11.7 11.5 10.0 11.9
Rolls-Royce, NTU, Singapore [8] 8.5 17.6 12.1 8.5 9.6 11.9
A*Star, Singapore [9] 8.6 18.6 10.7 9.7 9.6 12.1
U.Paderborn, Germany [10] 9.0 17.5 10.5 11.0 10.0 12.3
DNN Baseline v2 [11] -- 19.1 11.4 10.3 10.3 12.8
Chinese Ac. Sciences [12] 9.7 17.7 11.8 13.4 10.0 13.2
FBK+U.Trento, Italy [13] 7.1 17.7 14.1 13.0 9.2 13.5
SJTU, China; U. Missouri [14] 6.2 18.0 15.4 12.2 9.6 13.8
Mitsubishi Electric, Japan [15] 8.4 23.2 13.9 11.1 8.4 14.2
Lingban Tech Co, China [16] 6.1 16.2 13.4 17.0 10.5 14.3
STC Inc. + ITMO U., Russia [17] 13.8 17.4 11.5 18.0 10.5 14.3
OSU, Colombus, US [18] 21.0 24.7 14.0 13.7 12.9 16.3
U. Sheffield, UK [19] 20.0 24.6 18.4 16.9 14.4 18.6
TUT, Finland [20] 24.4 28.4 20.6 19.0 16.4 21.1
U.Oldenburg, Germany [21] 15.9 30.5 23.8 18.1 15.5 22.0
TUG, Austria [22] 14.9 29.0 24.0 19.8 15.7 22.1
KU Leuven, Belgium [23] 6.9 28.4 26.5 22.3 15.1 23.1
U. Passau + TUM, Germany [24] 21.5 30.7 27.3 21.3 18.3 24.4
U. Erlangen, Germany [25] 15.2 35.6 32.7 26.6 19.9 28.7
NCTU, Taiwan [26] 16.9 45.0 29.2 23.8 19.1 29.3
DNN Baseline [27] 21.5 51.8 34.7 27.2 20.1 33.4

References

  1. T. Yoshioka, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita, M. F. C. Yu, W. J. Fabian, M. Espi, T. Higuchi, S. Araki, and T. Nakatani, “The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone de- vices,” in Proc. IEEE ASRU, 2015.
  2. T. Hori, Z. Chen, H. Erdogan, J. R. Hershey, J. L. Roux, V. Mitra, and S. Watanabe, “The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extrac- tion, and advanced speech recognition,” in Proc. IEEE ASRU, 2015.
  3. J. Du, Q. Wang, Y.-H. Tu, X. Bao, L.-R. Dai, and C.-H. Lee, “An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learn- ing framework,” in Proc. IEEE ASRU, 2015.
  4. S. Sivasankaran, A. A. Nugraha, E. Vincent, J. A. Morales-Cordovilla, S. Dalmia, and I. Illina, “Robust ASR using neural network based speech enhancement and feature simulation,” in Proc. IEEE ASRU, 2015.
  5. N. Moritz, S. Gerlach, K. Adiloglu, J. Anemu ̈ller, B. Kollmeier, and S. Goetze, “A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition,” in Proc. IEEE ASRU, 2015.
  6. Y. Fujita, R. Takashima, T. Homma, R. Ikeshita, Y. Kawaguchi, T. Sumiyoshi, T. Endo, and M. Togami, “Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection,” in Proc. IEEE ASRU, 2015.
  7. S. Zhao, X. Xiao, Z. Zhang, T. N. T. Nguyen, X. Zhong, B. Ren, L. Wang, D. L. Jones, E. S. Chng, and H. Li, “Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction,” in Proc. IEEE ASRU, 2015.
  8. T. T. Vu, B. Bigot, and E. S. Chng, “Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge,” in Proc. IEEE ASRU, 2015.
  9. H. D. Tran, J. Dennis, and L. Yiren, “A comparative study of multi-channel processing methods for noisy automatic speech recognition on the third CHiME challenge,” Submitted to 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
  10. J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge,” in Proc. IEEE ASRU, 2015.
  11. Kaldi CHiME-3 Baseline, v2
  12. X. Wang, C. Wu, P. Zhang, Z. Wang, Y. Liu, X. Li, Q. Fu, and Y. Yan, “Noise robust IOA/CAS speech separation and recognition system for the third ’CHiME’ challenge,” 2015, arXiv:1509.06103.
  13. S. Jalalvand, D. Falavigna, M. Matassoni, P. Svaizer, and M. Omologo, “Boosted acoustic model learning and hypotheses rescoring on the CHiME3 task,” in Proc. IEEE ASRU, 2015.
  14. Y. Zhuang, Y. You, T. Tan, M. Bi, S. Bu, W. Deng, Y. Qian, M. Yin, and K. Yu, “System combination for multi-channel noise robust ASR,” Tech. Rep. SJTU SpeechLab Technical Report, SP2015-07, Department of Computer Science and En- gineering, Shanghai Jiao Tong University, Shanghai, China, 2015.
  15. Y. Tachioka, H. Kanagawa, and J. Ishii, “The overview of the MELCO ASR system for the third CHiME challenge,” Tech. Rep., Mitsubishi Electric, 2015.
  16. Z. Pang and F. Zhu, “Noise-robust ASR for the third ’CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network,” 2015, arXiv:1509.07211.
  17. A. Prudnikov, M. Korenevsky, and S. Aleinik, “Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition,” in Proc. IEEE ASRU, 2015.
  18. D. Bagchi, M. I. Mandel, Z. Wang, Y. He, A. Plummer, and E. Fosler-Lussier, “Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition,” in Proc. IEEE ASRU, 2015.
  19. N. Ma, R. Marxer, J. Barker, and G. J. Brown, “Exploiting synchrony spectra and deep neural networks for noise-robust automatic speech recognition,” in Proc. IEEE ASRU, 2015.
  20. P. Pertila, A. Hurmalainen, S. Nandakumar, and T. Virtanen, “Automatic speech recognition with multichannel neural network based speech enhancement,” Tech. Rep. ISBN 978-952- 15-3590-1, Department of Signal Processing, Tampere Univer- sity of Technology, 2015.
  21. A. C. Martinez and B. Meyer, “Mutual benefits of auditory spectro-temporal Gabor features and deep learning for the 3rd CHiME challenge,” Tech. Rep. Techncial Report 2509, University of Oldenburg, Germany, 2015, url:http://oops.uni- oldenburg.de/2509.
  22. L. Pfeifenberger, T. Schrank, M. Zo ̈hrer, M. Hagmu ̈ller, and F. Pernkopf, “Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results,” in Proc. IEEE ASRU, 2015.
  23. D. Baby, T. Virtanen, and H. V. Hamme, “Coupled dictionary-based speech enhancement for CHiME-3 challenge,” Tech. Rep. KUL/ESAT/PSI/1503, KU Leuven, ESAT, Leuven, Bel- gium, Sept. 2015.
  24. A. E.-D. Mousa, E. Marchi, and B. Schuller, “The ICSTM+TUM+UP approach to the 3rd CHiME challenge: Single-channel LSTM speech enhancement with multi-channel correlation shaping dereverberation and LSTM language models,” 2015, arXiv:1510.00268.
  25. H. Barfuss, C. Huemmer, A. Schwarz, and W. Kellermann, “Robust coherence-based spectral enhancement for distant speech recognition,” 2015, arXiv:1509.06882.
  26. A.Misbullah and J.- T. Chien, “Deep feedforward and recurrent neural networks for speech recognition,” unpublished technical report.