The workshop will commence at 11:00 am (CEST). The programme will be single-track and will consist of three oral presentation sessions and two invited keynote talks.

11:10Oral session 1
12:35 Break
12:45Keynote 1 : Leibny Paola Garcia Perara (Johns Hopkins University)
13:50 Break
14:00Oral session 2
15:20 Break
15:30 Keynote 2: Dong Yu (Tencent AI Lab)
16:30 Break
16:40 Oral session 3

Oral Session 1

11:10 Overview of the 6th CHiME Challenge
Shinji Watanabe1, Michael Mandel2, Jon Barker3, Emmanuel Vincent4 (1Center for Language and Speech Processing, Johns Hopkins University; 2Brooklyn College, City University of New York; 3University of Sheffield, UK; 4Inria, France)
11:35 The IOA Systems for CHiME-6 Challenge [Abstract]
Hangting Chen1,2, Pengyuan Zhang1,2, Qian Shi1,2, Zuozhen Liu1,2 (1Key Laboratory of Speech Acoustics & Content Understanding, Institute of Acoustics, CAS, China; 2University of Chinese Academy of Sciences, Beijing, China)
11:55 The OPPO System for CHiME-6 Challenge [Abstract]
Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Linju Yang, Ming Yu, Chenxing Li, Dong Wei, Jie Hao (Beijing OPPO telecommunications corp., ltd., Beijing, China)
12:15 The Qdreamer Systems for CHiME-6 Challenge [Abstract]
Haoyuan Tang1, Huanliang Wang1, Jiajun Wang1, Li Zhang1, JiaBin Xue2, Zhi Li1 (1Qdreamer Research, Suzhou, JiangSu, P.R. China; 2School of Computer Science and Technology, Harbin Institute of Technology, Harbin, P.R. China)

Oral Session 2

13:55 The USTC-NELSLIP Systems for CHiME-6 Challenge [Abstract]
Jun Du1, Yan-Hui Tu1, Lei Sun1, Li Chai1, Xin Tang1, Mao-Kui He1, Feng Ma1, Jia Pan1, Jian-Qing Gao1, Dan Liu1, Chin-Hui Lee2, Jing-Dong Chen3 (1University of Science and Technology of China, Hefei, Anhui, P. R. China; 2Georgia Institute of Technology, Atlanta, Georgia, USA; 3Northwestern Polytechnical University, Shanxi, P. R. China)
14:20 The CW-XMU System For CHiME-6 Challenge [Abstract]
Xuerui Yang1, Yongyu Gao1, Shi Qiu1, Song Li2, Qingyang Hong2, Xuesong Liu1, Lin Li2, Dexin Liao2, Hao Lu2, Feng Tong2, Qiuhan Guo2, Huixiang Huang2, Jiwei Li1 (1CloudWalk Technology Co., Ltd.; 2Xiamen University)
14:40 The Academia Sinica Systems of Speech Recognition and Speaker Diarization for the CHiME-6 Challenge [Abstract]
Hung-Shin Lee1, Yu-Huai Peng1, Pin-Tuan Huang1, Ying-Chun Tseng2, Chia-Hua Wu1, Yu Tsao2, Hsin-Min Wang1 (1Institute of Information Science, Academia Sinica, Taiwan; 2Research Center for Information Technology Innovation, Academia Sinica, Taiwan)
14:55 LEAP Submission to CHiME-6 ASR Challenge [Abstract]
Anirudh Sreeram, Anurenjan Purushothaman, Rohit Kumar, Sriram Ganapathy. (Learning and Extraction of Acoustic Patterns (LEAP) lab Indian Institute of Science, Bangalore, 560012.)

Oral Session 3

16:40 The STC System for the CHiME-6 Challenge [Abstract]
Ivan Medennikov1,2 Maxim Korenevsky1, Tatiana Prisyach1, Yuri Khokhlov1, Mariya Korenevskaya1, Ivan Sorokin1, Tatiana Timofeeva1, Anton Mitrofanov1, Andrei Andrusenko1,2, Ivan Podluzhny1, Aleksandr Laptev1,2, Aleksei Romanenko1,2 (1STC-innovations Ltd; 2ITMO University, Saint Petersburg, Russia)
17:05 Towards a speaker diarization system for the CHiME 2020 dinner party transcription [Abstract]
Christoph Boeddeker1, Tobias Cord-Landwehr1, Jens Heitkaemper1, Ca ̆tălin Zorilă2, Daichi Hayakawa3, Mohan Li2, Min Liu4, Rama Doddipatla2, Reinhold Haeb-Umbach1 (1Paderborn University, Department of Communications Engineering, Paderborn, Germany; 2Toshiba Cambridge Research Laboratory, Cambridge, United Kingdom; 3 Toshiba Corporation Corporate R&D Center, Kawasaki, Japan; 4Toshiba China R&D Center, Beijing, China)
17:25 The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge [Abstract]
Ashish Arora, Desh Raj, Aswin Shanmugam Subramanian, Ke Li, Bar Ben-Yair, Matthew ̇Maciejewski, Piotr Zelasko, la Garcia, Shinji Watanabe, Sanjeev Khudanpur (Center for Language and Speech Processing, Johns Hopkins University)
17:45 CUNY Speech Diarization System for the CHiME-6 Challenge [Abstract]
Zhaoheng Ni1, Michael I Mandel2 (1The Graduate Center, City University of New York; 2Brooklyn College, City University of New York)
18:05 BUT System for CHiME-6 Challenge [Abstract]
Katerina Zmolikova, Martin Kocour, Federico Landini, Karel Beneš, Martin Karafiát, Hari Krishna Vydana, Alicia Lozano-Diez, Oldřich Plchot, Murali Karthick Baskar,Ján Svec, Ladislav Mošner, Vladimir Malenovský, Lukáš Burget, Bolaji Yusuf,Ondřej Novotný, František Grézl, Igor Szöke, Jan “Honza” Černocký (Brno University of Technology, Faculty of Information Technology, IT4I Centre of Excellence, Czechia)
18:25 Toshiba’s Speech Recognition System for the CHiME 2020 Challenge [Abstract]
Cătălin Zorila ̆1, Mohan Li1, Daichi Hayakawa2, Min Liu3, Ning Ding2 and Rama Doddipatla1 (1Toshiba Cambridge Research Laboratory, Cambridge, United Kingdom; 2Toshiba Corporation Corporate R&D Center, Kawasaki, Japan; 3Toshiba China R&D Center, Beijing, China)

Keynote Talks

Leibny Paola Garcia Perara

Dr. Leibny Paola Garcia Perera
Johns Hopkins University

Diarization, the missing link in Speech Technologies

Diarization, the missing link in Speech Technologies


The amount of unlabeled speech data that is available enormously outweighs the labeled data, and there is great potential in using this data to improve the performance of current speech recognition systems and related technologies. A primary goal of research in this domain is to automatically compute labels for the unlabelled data with an acceptable level of accuracy for downstream applications. One such task is to answer the question "who spoke when" in a recording, identifying regions containing speech and assigning speaker identity labels to each utterance. This labeling, called speaker diarization, is not typically the final task for applications, but often a missing link in a pipeline that can boost the performance of automatic speech recognition and speaker and language identification systems. In this talk, I will guide you through a journey of this missing link. We will start with a brief discussion of the key components that comprise the state-of-the-art systems—discussing the usage of a voice activity detector, speaker embeddings, scoring and clustering techniques. Next, we will demonstrate the aspects in which current systems fail and propose new alternatives to attain better performance. We will address overlap detection, resegmentation, and speaker turn detection among others. In addition, we will give some insights of the newest solutions, such as end-to-end approaches. Then, we will go beyond diarization and explore the positive impact of including a diarization stage in speech and speaker recognition systems. Finally, we will discuss the influence of diarization in other fields such as cognitive science and linguistics.


Dr. Leibny Paola Garcia Perera (PhD 2014, University of Zaragoza, Spain) joined Johns Hopkins University after extensive research experience in academia and industry, including highly regarded laboratories at Agnitio and Nuance Communications. She lead a team of 20+ researchers from four of the best laboratories worldwide in far-field speech diarization and speaker recognition, under the auspices of the JHU summer workshop 2019 in Montreal , Canada. She was also a researcher at Tec de Monterrey, Campus Monterrey, Mexico for 10 years. She was a Marie Curie researcher for the Iris project during 2015, exploring assistive technology for children with autism in Zaragoza, Spain. She was a visiting scholar at Georgia Institute of Technology (2009) and Carnegie Mellon (2011). Recently, she has been working on children’s speech; including child speech recognition and diarization in day-long recordings. She is also part of the JHU CHiME-5, CHiME-6, SRE18 and SRE19 teams. Her interests include diarization, speech recognition, speaker recognition, machine learning and language processing.

Dong Yu

Dr. Dong Yu
Distinguished Scientist, Tencent AI Lab

Solving Cocktail Party Problem – From Single Modality to Multi-Modality

Solving Cocktail Party Problem – From Single Modality to Multi-Modality


Cocktail party problem is one of the difficult problems yet to be solved to enable high-accuracy speech recognition in everyday environments. In this talk, I will introduce our recent attempts to attack this problem with a focus on multi-channel multi-modal approaches.


Dr. Dong Yu, IEEE Fellow, is a distinguished scientist and vice general manager at Tencent AI Lab. Prior to joining Tencent in 2017, he was a principal researcher at Microsoft speech and dialog research group. His research works, which focus on statistical speech recognition and processing, have been recognized by the prestigious IEEE Signal Processing Society 2013 and 2016 best paper award and have been widely cited.

Dr. Dong Yu is currently serving as the vice chair of the IEEE Speech and Language Processing Technical Committee (SLPTC). He has served as a member of the IEEE SLPTC (2013-2018), a distinguished lecturer of APSIPA (2017-2018), an associate editor of the IEEE/ACM transactions on audio, speech, and language processing (2011-2015), an associate editor of the IEEE signal processing magazine (2008-2011), and members of organization and technical committees of many conferences and workshops.