Programme
The workshop will be a full-day event from 9:00 to 18:10 and followed by an on-site evening social. The programme is single-track and consists of a mix of oral and poster sessions, discussion sessions and two invited keynote talks.
Programme Overview
The preliminary program is below. The start and end times are fixed but there may be minor changes to the session timings.
07:45 | Bus boarding at HICC (prior reservation required, the buses will carry the Microsoft placard and will depart at 08:00 or earlier if they are full) |
08:30 | Arrival at Microsoft, security clearance (bring a photo ID) and badge collection |
- | |
09:00 | Welcome |
09:10 | Overview of the 5th CHiME Challenge |
09:40 | Oral session 1 |
10:40 | Break |
11:00 | Keynote 1 : Florian Metze (Carnegie Mellon University) |
12:00 | Discussion |
12:20 | Lunch |
13:30 | Speech research at Microsoft |
13:45 | Oral session 2 |
14:45 | Poster session |
16:30 | Break |
16:45 | Keynote 2: John HL Hansen (University of Texas, Dallas) |
17:45 | Discussion |
18:00 | Closing |
18:10 | Social event (food and drinks) |
- | |
19:00 | First bus boarding (prior reservation required, will depart from Microsoft at 19:15 or earlier if it is full, and arrive at HICC around 19:45) |
19:45 | Second bus boarding (prior reservation required, will depart from Microsoft at 20:00 or earlier if it is full, and arrive at HICC around 20:30) |
Oral Session 1
9:40 | The STC System for the CHiME 2018 Challenge [Paper][View Slides] |
10:00 | The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays [Paper][View Slides] |
10:20 | The USTC-iFlytek systems for CHiME-5 Challenge [Paper][View Slides] |
Keynote 1: Florian Metze, Carnegie Mellon University
Session chair: Arun Narayanan, Google
Open-domain audiovisual speech recognition and video summarization [View Slides]
Abstract Video understanding is one of the hardest challenges in AI. If a machine can look at videos and “understand” the events that are being shown, then machines could learn by themselves, perhaps even without supervision, simply by “watching” broadcast TV, Facebook, Youtube, or similar sites. Making progress towards this goal requires contributions from experts in diverse fields, including computer vision, automatic speech recognition, machine translation, natural language processing, multimodal information processing, and multimedia. I will report the outcomes of the JSALT 2018 Workshop on this topic, including advances in multitask learning for joint audiovisual captioning, summarization, and translation, as well as auxiliary tasks such as text-only translation, language modeling, story segmentation, and classification. I will demonstrate a few results on the “How-to” dataset of instructional videos harvested from the web by my team at Carnegie Mellon University and discuss remaining challenges and possible other datasets for this research.Oral Session 2
Session chair: Yusuke Fujita, Hitachi
13:45 | The NWPU System for CHiME-5 Challenge [Paper][View Slides] |
14:05 | Channel selection from DNN posterior probability for speech recognition with distributed microphone arrays in everyday environments [Paper][View Slides] |
14:25 | Scaling speech enhancement in unseen environments with noise embeddings [Paper][View Slides] |
Poster Session
Session chair: Jun Du, University of Science and Technology of China
- The ZTSpeech system for CHiME-5 Challenge: A far-field speech recognition system with front-end and robust back-end [Paper][View Poster]
- The SHNU system for the CHiME-5 Challenge [Paper]
- Front-end processing for the CHiME-5 dinner party scenario [Paper]
- DA-IICT/IIITV system for the 5th CHiME 2018 Challenge [Paper][View Poster]
- The Toshiba entry to the CHiME 2018 Challenge [Paper][View Poster]
- The RWTH/UPB system combination for the CHiME 2018 Workshop [Paper]
- The NDSC transcription system for the 2018 CHiME-5 Challenge [Paper]
- Multiple beamformers with ROVER for the CHiME-5 Challenge [Paper] [View Poster]
- NMF based front-end processing in multi-channel distant speech recognition [Paper][View Poster]
- CHiME 2018 Workshop: Enhancing beamformed audio using time delay neural network denoising autoencoder [Paper][View Poster]
- Situation informed end-to-end ASR for noisy environments [Paper][View Poster]
- Robust network structures for acoustic model on CHiME5 Challenge dataset [Paper][View Poster]
- Channel-selection for distant-speech recognition on CHiME-5 dataset [Paper][View Poster]
- Acoustic features fusion using attentive multi-channel deep architecture [Paper]
- The AnTech system for CHiME-5 Challenge [Paper]
- A novel speech enhancement method based on multiple-microphone arrays [Paper]
- LEAP submission to CHiME-5 Challenge [Paper][View Poster]
Keynote 2: John HL Hansen, University of Texas at Dallas
Session chair: Ralf Schlüter, RWTH Aachen University