INTERSPEECH.2010 - Keynote

Total: 3

#1 Still talking to machines (cognitively speaking) [PDF] [Copy] [Kimi1]

Author: Steve Young

This overview article reviews the structure of a fully statistical spoken dialogue system (SDS), using as illustration, various systems and components built at Cambridge over the last few years. Most of the components in an SDS are essentially classifiers which can be trained using supervised learning. However, the dialogue management component must track the state of the dialogue and optimise a reward accumulated over time. This requires techniques for statistical inference and policy optimisation using reinforcement learning. The potential advantages of a fully statistical SDS are the ability to train from data without hand-crafting, increased robustness to environmental noise and user uncertainty, and the ability to adapt and learn on-line.

#2 Sound-based assistive technology supporting 'seeing', 'hearing' and 'speaking' for the disabled and the elderly [PDF] [Copy] [Kimi1]

Author: Tohru Ifukube

With a rapid increase of a population rate of the elderly, disabled people also have been increasing in Japan. Over a period of 40 years, author has developed a basic research approach of assistive technology, especially for people with seeing, hearing, and speaking disorders. Although some of the required tools have been practically used for the disabled in Japan, the author has experienced how insufficient a function of the tools is for supporting them. Moreover, the author has been impressed by how amazingly potential ability of the human brain has in order to compensate the disorders.

#3 Beyond sentence prosody [PDF] [Copy] [Kimi2]

Author: Chiu-yu Tseng

The prosody of a sentence (utterance) when it appears in a discourse context differs substantially from when it is uttered in isolation. This paper addresses why paragraph is a discourse unit and discourse prosody is an intrinsic part of naturally occurring speech. Higher level discourse information treats sentences, phrases and their lower level units as sub-units and layers over them; and realized in patterns of global prosody. A perception based multi-phrase discourse prosody hierarchy and a parallel multi-phrase associative template were proposed to test discourse prosodic modulations. Results from quantitative modeling of speech data show that output discourse prosody can be derived through multiple layers of higher level modulations. The seemingly random occurrence of lower level prosodic units such as intonation variations is, in fact, systematic. In summary, abundant traces of global prosody can be recovered from the speech signal and accounted for; their patterns could help facilitate better understanding of spoken language processing.