INTERSPEECH.2009 - Keynote

Total: 4

#1 Selected topics from 40 years of research on speech and speaker recognition [PDF] [Copy] [Kimi1]

Author: Sadaoki Furui

This paper summarizes my 40 years of research on speech and speaker recognition, focusing on selected topics that I have investigated at NTT Laboratories, Bell Laboratories and Tokyo Institute of Technology with my colleagues and students. These topics include: the importance of spectral dynamics in speech perception; speaker recognition methods using statistical features, cepstral features, and HMM/GMM; text-prompted speaker recognition; speech recognition using dynamic features; Japanese LVCSR; robust speech recognition; spontaneous speech corpus construction and analysis; spontaneous speech recognition; automatic speech summarization; and WFST-based decoder development and its applications.

#2 Connecting human and machine learning via probabilistic models of cognition [PDF] [Copy] [Kimi1]

Author: Thomas L. Griffiths

Human performance defines the standard that machine learning systems aspire to in many areas, including learning language. This suggests that studying human cognition may be a good way to develop better learning algorithms, as well as providing basic insights into how the human mind works. However, in order for ideas to flow easily from cognitive science to computer science and vice versa, we need a common framework for describing human and machine learning. I will summarize recent work exploring the hypothesis that probabilistic models of cognition, which view learning as a form of statistical inference, provide such a framework, including results that illustrate how novel ideas from statistics can inform cognitive science. Specifically, I will talk about how probabilistic models can be used to identify the assumptions of learners, learn at different levels of abstraction, and link the inductive biases of individuals to cultural universals.

#3 New horizons in the study of child language acquisition [PDF] [Copy] [Kimi1]

Author: Deb Roy

Naturalistic longitudinal recordings of child development promise to reveal fresh perspectives on fundamental questions of language acquisition. In a pilot effort, we have recorded 230,000 hours of audio-video recordings spanning the first three years of one childfs life at home. To study a corpus of this scale and richness, current methods of developmental cognitive science are inadequate. We are developing new methods for data analysis and interpretation that combine pattern recognition algorithms with interactive user interfaces and data visualization. Preliminary speech analysis reveals surprising levels of linguistic fine-tuning by caregivers that may provide crucial support for word learning. Ongoing analyses of the corpus aim to model detailed aspects of the child's language development as a function of learning mechanisms combined with lifetime experience. Plans to collect similar corpora from more children based on a transportable recording system are underway.

#4 Transcribing human-directed speech for spoken language processing [PDF] [Copy] [Kimi1]

Author: Mari Ostendorf

As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives, raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken language. In this talk, we look at differences between speech and text, and how we can leverage the information in the speech signal beyond the words to provide structural information in a rich, automatically generated transcript that better serves language processing applications. In particular, we look at three interrelated types of structure (orthographic, prosodic, and syntactic), methods for automatic detection, the benefit of optimizing rich transcription for the target language processing task, and the impact of this structural information in tasks such as information extraction, translation, and summarization.