The UDASE task is about exploiting unlabeled data to overcome the performance drop caused by domain mismatch in speech enhancement models trained on artificially-generated labeled data. In this case, we define “labeled data” as audio mixtures with corresponding clean source references, and “unlabeled data” as audio mixtures without clean source references (i.e. audio recorded in the real world).
The task consists of denoising conversational speech recordings using out-of-domain labeled data and in-domain unlabeled data. Given a mixture of one or more reverberant speakers and additive noise, the goal is to predict the clean audio mixture of the reverberant speaker(s), removing the additive noise.
The main objective of this task is to advance semi-supervised learning and unsupervised domain adaptation methods for speech enhancement, and to develop methods that could better generalize to real-world conditions.
This task is motivated by the assistive listening use-case, in which a speech enhancement algorithm can help any individual to better engage in a conversation, by improving the overall speech intelligibility and quality within the ambient noise. This task is a general-purpose research project whose goal is to advance semi-supervised learning and unsupervised domain adaptation methods in speech enhancement, intended for use by all individuals in specific listening environments. This research is not specifically intended to aid persons with hearing impairments.
Supervised speech enhancement requires parallel labeled training data consisting of both noisy speech signals and the corresponding clean signals. Given the impossibility of acquiring such data in real conditions, datasets are generated artificially by creating synthetic mixtures of isolated speech and noise signals. However, it is difficult or impossible to generate synthetic mixtures that reliably match arbitrary acoustic conditions at test time, in terms of noise level, noise type, recording equipment, speaker-to-microphone distance and orientation, reverberation, etc. Artificially-generated training data are thus inevitably mismatched with real-world noisy speech recordings, which can result in poor speech enhancement performance in case of severe mismatch. Conversely, recording noisy speech signals in the target test domain is much easier, but leveraging such unlabeled data to develop/train speech enhancement models is a challenging problem, which is the focus of this task.
The in-domain unlabeled data are extracted from the real noisy conversational speech recordings of CHiME-5 dinner parties. We extracted portions of the binaural recordings where the participant wearing the microphone does not speak. The out-of-domain labeled data correspond to the LibriMix dataset. In both cases, the data correspond to mixtures of speech and noise, with up to three simultaneously-active speakers.
For developement and evaluation on close to in-domain labeled data, we created the reverberant LibriCHiME-5 dataset.
For more information, please visit the Data section.
As learning from unlabeled in-domain data is already a difficult problem that requires new methodologies beyond standard supervised learning, the UDASE task only focuses on single-microphone noise suppression, without addressing speech separation or dereverberation. The target signal thus corresponds to single-microphone and potentially multi-speaker speech with attenuated background noise.
For more information, please visit the Submission section.
The submitted systems will follow a two-step evaluation process, first using objective performance metrics and then listening tests. The final ranking of the systems will be based on the results of the listening tests.
For more information, please visit the Rules section.
The submission deadline for the UDASE task is 16 June 2023.
Please visit the following sections:
If you are considering participating or just want to learn more then please sign up to the CHiME Google Group. We will be using this group to send general announcements that will keep prospective participants updated as the challenge progresses. Participants are also invited to use the “chime-7-task-2-udase” channel of the the CHiME slack workspace for discussions about the UDASE task.