Programme

The detailed programme follows below.

9:00 Welcome:
9:10 Keynote 1: Steven J. Rennie (IBM T.J. Watson Research Center) (View Slides)
10:00 Break
10:20 Overview of the 2nd CHiME Challenge
10:50 Oral session 1: challenge papers
12:10 Lunch
13:40 Poster session
15:40 Break
16:00 Oral session 2: challenge-related paper
16:20 Keynote 2: Daniel P.W. Ellis (Columbia University) (View Slides)
17:10 Plenary discussion
17:50 Closing

Detailed Programme

Welcome

  • The 2nd International Workshop on Machine Listening in Multisource Environments (View Slides)
    Emmanuel Vincent (Inria, France), Jon Barker (University of Sheffield, UK), Shinji Watanabe, Jonathan Le Roux (MERL, USA), Francesco Nesta and Marco Matassoni (FBK-Irst, Italy)

Keynote 1

Session Chair: Jen-Tzung Chien (National Chiao Tung University, Taiwan)
  • Model-based speech separation and recognition: Yesterday, Today, and Tomorrow (View Slides)
    Steven J. Rennie (IBM T.J. Watson Research Center)
    Abstract Recently, model-based approaches for multi-talker speech separation and recognition have demonstrated great success in highly constrained scenarios, and efficient algorithms for separating data with literally trillions of underlying states have been unveiled. In less constrained scenarios, deep neural networks (DNNs) learned on features inspired by human auditory processing have shown great capacity for directly learning masking functions from parallel data. Ideally, a robust speech separation/recognition system should be continuously learning, adapting, and exploiting structure that is present in both target and peripheral signals and interactions, make minimal assumptions about the data to be separated/recognized, not require parallel data streams, and have essentially unlimited information capacity. In this talk I’ll briefly review the current state of robust speech separation/recognition technology--where we are, where we apparently need to go, and how we might get there. I’ll then discuss in more detail recent work that I’ve been involved with that is aligned with these goals. Specifically, I will discuss some new results on efficiently learning the structure of models and efficiently optimizing a wide class of matrix-valued functions, some recent work on Factorial Restricted Boltzmann machines for robust ASR, and finally, Direct-product DBNs, a new architecture that makes it feasible to learn DNNs with literally millions of neurons.

Overview of the 2nd CHiME Challenge

  • Datasets, tasks, baselines and results (View Slides)
    Emmanuel Vincent (Inria, France), Jon Barker (University of Sheffield, UK), Shinji Watanabe, Jonathan Le Roux (MERL, USA), Francesco Nesta and Marco Matassoni (FBK-Irst, Italy)

Oral session 1

Session Chair: Tomohiro Nakatani (NTT, Japan)

Poster session

Session Chair: Dorothea Kolossa (Ruhr-Universität Bochum, Germany)

Oral session 2

Session Chair: Kalle Palomäki (Aalto University, Finland)

Keynote 2

Session Chair: Richard Lyon (Google, USA)
  • Recognizing and Classifying Environmental Sounds (View Slides)
    Daniel P.W. Ellis (Columbia University)

    Abstract Animal hearing exists to extract useful information out of the environment, and for a lot of animals for a large portion of the evolutionary history of hearing this sound environment has not consisted of speech or music, but of more generic acoustic information arising from collisions, motions, and other events in the external world. This aspect of sound analysis -- getting information out of non-speech, non-music, environmental sounds -- is finally beginning to gain popularity in research since it holds promise as a tool for automatic search and retrieval of audio/video recordings, an increasingly urgent problem. I will discuss our recent work in using audio analysis to manage and search environmental sound archives (including personal audio lifelogs and consumer video collections), and illustrate with some of the approaches that work more or less well, with an effort to explain why.

    Bio Dan Ellis is an Associate Professor of Electrical Engineering at Columbia University, where he leads the Laboratory for Recognition and Organization of Speech and Audio (LabROSA), concerned with extracting useful information from real-world sounds of all kinds. His bachelors degree is from Cambridge, his Ph.D. from the MIT Media Lab, and he was a postdoc at the International Computer Science Institute in Berkeley, where he remains an external fellow. He is the author of a number of widely-used software tools, and he runs the AUDITORY email list of over 2000 researchers interested in the perception and cognition of sound.