Programme

This program may be subject to minor changes. The detailed programme follows below.

9:00	Welcome
9:10	Keynote 1: Michiel Bacchiani (Google)
10:00	Break
10:20	Overview of the 4th CHiME Challenge
10:50	Poster session 1
12:20	Lunch
13:30	Keynote 2: Björn Schuller (University of Passau)
14:20	Oral session
15:40	Break
16:00	Poster session 2
17:30	Closing

Detailed Programme

Welcome

The 4th International Workshop on Speech Processing in Everyday Environments
Emmanuel Vincent (Inria, France), Shinji Watanabe (Mitsubishi Electric Research Labs, USA), Jon Barker and Ricard Marxer (University of Sheffield, UK)

Keynote 1

Session Chair: Ralf Schlüter (RWTH Aachen University, Germany)

Google speech processing from mobile to farfield [View Slides] [View Video]
Michiel Bacchiani (Google)
Abstract Recent years have shown a large scale adoption of speech recognition by the public, in particular around mobile devices. Google, with its Android operating system, has integrated speech recognition as a key input modality. The century of speech that our systems process each day shows how popular speech processing has become. This talk will briefly describe some of the history and highlight some of the technical challenges we faced.
More recently, home farfield devices, as popularized by Amazon Echo, have resulted in a major research emphasis on speech processing in such conditions. This talk will describe the Google research effort that underpin the upcoming Google Home devices. It will describe how our neural network technology is capable of processing multi-channel data and implicitly learns how to localize and beamform the incoming signal. We show three distinct approach to implement this. One uses factored raw waveform processing in the input layers. The second uses processing of the complex FFT signal in the input layer. And the third uses an adaptive filtering approach.

Overview of the 4th CHiME Challenge

Datasets, tasks, baselines and results [View Slides] [View Video]
Emmanuel Vincent (Inria, France), Shinji Watanabe (Mitsubishi Electric Research Labs, USA), Jon Barker and Ricard Marxer (University of Sheffield, UK)

Poster session 1

Session Chair: Athanasios Mouchtaris (FORTH-ICS and University of Crete, Greece)

The MELCO/MERL system combination approach for the fourth CHiME Challenge
Yuuki Tachioka (Mitsubishi Electric Corp., Japan), Shinji Watanabe and Takaaki Hori (Mitsubishi Electric Research Labs, USA)
LSTM network supported linear filtering for the CHiME 2016 Challenge
Xiaofei Wang and Ziteng Wang (Institute of Acoustics, Chinese Academy of Sciences, China)
The THU-SPMI CHiME-4 system : Lightweight design with advanced multi-channel processing, feature enhancement, and language modeling
Hongyu Xiang, Bin Wang and Zhijian Ou (Tsinghua University, China)
Wide residual BLSTM network with discriminative speaker adaptation for robust speech recognition
Jahn Heymann, Lukas Drude and Reinhold Haeb-Umbach (Paderborn University, Germany)
Deep beamforming and data augmentation for robust speech recognition: Results of the 4th CHiME Challenge
Tobias Schrank, Lukas Pfeifenberger, Matthias Zöhrer, Johannes Stahl, Pejman Mowlaee and Franz Pernkopf (Graz University of Technology, Austria)
The FBK system for the CHiME-4 Challenge
Marco Matassoni, Mirco Ravanelli, Shahab Jalalvand, Alessio Brutti and Daniele Falavigna (Fondazione Bruno Kessler, Italy)
Investigation of neural networks based beamforming approaches for speech recognition: The NTU systems for CHiME-4 evaluation
Xiong Xiao, Chenglin Xu (Nanyang Technological University, Singapore), Zhaofeng Zhang (Nagaoka University of Technology, Japan), Shengkui Zhao (Advanced Digital Sciences Center, Singapore), Sining Sun (Northwestern Polytechnical University, China), Shinji Watanabe (Mitsubishi Electric Research Labs, USA), Longbiao Wang (Tianjin University, China), Lei Xie (Northwestern Polytechnical University, China), Douglas L. Jones (Advanced Digital Sciences Center, Singapore), Eng Siong Chng (Nanyang Technological University, Singapore) and Haizhou Li (National University of Singapore, Singapore)
Evolution strategy based neural network optimization and LSTM language model for robust speech recognition
Tomohiro Tanaka, Takahiro Shinozaki (Tokyo Institute of Technology, Japan), Shinji Watanabe and Takaaki Hori (Mitsubishi Electric Research Labs, USA)

Keynote 2

Session Chair: Rahim Saeidi (Aalto University, Finland)

Computational paralinguistics in everyday environments [View Slides]
Björn Schuller (University of Passau)
Abstract An increasingly long list of states and traits of speakers is being targeted for automatic recognition by computers including their age, emotion, health condition, or personality. However, hardly any of these have been encountered in “everyday” usage by the broad consumer mass up to now. This is certainly also owed to robustness issues, which shall be discussed here. Traditionally, these comprise speech enhancement, feature enhancement, feature space adaptation, or matched conditions training – mainly to cope with additive or convolutional noise. In addition, a number of further robustness issues mark this field of speech analysis, including interdependence of states and traits, potential subjectivity in the labels, phonetic content variation in the acoustic analysis, varying language and erroneous speech recognition in the linguistic analysis, and diversity of the cultural background of speakers. Finally, a number of hardly tackled issues remain such as the analysis of multiple speakers or in far field condition with multiple microphones. In the talk, an overview on these challenges and existing solutions is given. Then, required future research efforts will be named to help Computational Paralinguistics’ massive launch into the next generation dialogue systems and many other applications.

Oral session

Session Chair: Marc Delcroix (NTT, Japan)

The USTC-iFlytek system for CHiME-4 Challenge [View Slides] [View Video]
Jun Du, Yan-Hui Tu, Lei Sun (University of Science and Technology of China, China), Feng Ma, Hai-Kun Wang, Jia Pan, Cong Liu (iFlytek Research, China) and Chin-Hui Lee (Georgia Institute of Technology, USA)
The RWTH/UPB/FORTH system combination for the 4th CHiME Challenge evaluation [View Slides] [View Video]
Tobias Menne (RWTH Aachen University, Germany), Jahn Heymann (Paderborn University, Germany), Anastasios Alexandridis (FORTH-ICS and University of Crete, Greece), Kazuki Irie, Albert Zeyer, Markus Kitza, Pavel Golik (RWTH Aachen University, Germany), Lukas Drude (Paderborn University, Germany), Ralf Schlüter, Hermann Ney (RWTH Aachen University, Germany), Reinhold Haeb-Umbach (Paderborn University, Germany) and Athanasios Mouchtaris (FORTH-ICS and University of Crete, Greece)
Multi-channel speech recognition: LSTMs all the way through
Hakan Erdogan (Sabanci University, Turkey), Tomoki Hayashi, John Hershey, Takaaki Hori, Chiori Hori, Wei-Ning Hsu, Suyoun Kim, Jonathan Le Roux, Zhong Meng and Shinji Watanabe (Mitsubishi Electric Research Labs, USA)
Unsupervised network adaptation and phonetically-oriented system combination for the CHiME-4 Challenge [View Slides] [View Video]
Yusuke Fujita (Hitachi Ltd., Japan), Takeshi Homma (Hitachi Amercia Ltd., USA) and Masahito Togami (Hitachi Ltd., Japan)

Poster session 2

Session Chair: Xiong Xiao (Nanyang Technological University, Singapore)

The I2R system for CHiME-4 Challenge
Huy Dat Tran, Wen Zheng Ng, Sunil Sivadas, Trung Tuan Luong and Anh Dung Tran (Institute for Infocomm Research, Singapore)
The MLLP system for the 4th CHiME Challenge
Miguel Ángel Del Agua Teba, Adrià Martínez-Villaronga, Adrià Giménez Pastor, Alberto Sanchis, Jorge Civera and Alfons Juan (Universitat Politecnica de Valencia, Spain)
Multi-channel speech enhancement based on deep stacking network
Hui Zhang, Xueliang Zhang and Guanglai Gao (Inner Mongolia University, China)
CRIM’s speech recognition system for the 4th CHiME Challenge
Md Jahangir Alam, Vishwa Gupta and Patrick Kenny (Computer Research Institute of Montreal, Canada)
Robust automatic speech recognition for the 4th CHiME Challenge using copula-based feature enhancement
Alireza Bayestehtashk (Oregon Health & Science University, USA) and Zak Shafran (Google, USA)
The SJTU CHiME-4 system: Acoustic noise robustness for real single or multiple microphone scenarios
Yanmin Qian and Tian Tan (Shanghai Jiao Tong University, China)
CHiME4: Multichannel enhancement using beamforming driven by a DNN-based voice activity detector
Zbyněk Koldovský, Jiri Malek, Marek Bohac and Jakub Jansky (Technical University of Liberec, Czech Republic)
Wrapper-based acoustic group feature selection for noise-robust automatic sleepiness classification
Dara Pir, Theodore Brown (City University of New York, USA) and Jarek Krajewski (University of Wuppertal, Germany)