INTERSPEECH.2017 - Keynote

Total: 3

#1 Dialogue as Collaborative Problem Solving [PDF1] [Copy] [Kimi1]

Author: James Allen

I will describe the current status of a long-term effort at developing dialogue systems that go beyond simple task execution models to systems that involve collaborative problem solving. Such systems involve open-ended discussion and the tasks cannot be accomplished without extensive interaction (e.g., 10 turns or more). The key idea is that dialogue itself arises from an agent’s ability for collaborative problem solving (CPS). In such dialogues, agents may introduce, modify and negotiate goals; propose and discuss the merits possible paths to solutions; explicitly discuss progress as the two agents work towards the goals; and evaluate how well a goal was accomplished. To complicate matters, user utterances in such settings are much more complex than seen in simple task execution dialogues and requires full semantic parsing. A key question we have been exploring in the past few years is how much of dialogue can be accounted for by domain-independent mechanisms. I will discuss these issues and draw examples from a dialogue system we have built that, except for the specialized domain reasoning required in each case, uses the same architecture to perform three different tasks: collaborative blocks world planning, when the system and user build structures and may have differing goals; biocuration, in which a biologist and the system interact in order to build executable causal models of biological pathways; and collaborative composition, where the user and system collaborate to compose simple pieces of music.

#2 Conversing with Social Agents That Smile and Laugh [PDF] [Copy] [Kimi1]

Author: Catherine Pelachaud

Our aim is to create virtual conversational partners. As such we have developed computational models to enrich virtual characters with socio-emotional capabilities that are communicated through multimodal behaviors. The approach we follow to build interactive and expressive interactants relies on theories from human and social sciences as well as data analysis and user-perception-based design. We have explored specific social signals such as smile and laughter, capturing their variation in production but also their different communicative functions and their impact in human-agent interaction. Lately we have been interested in modeling agents with social attitudes. Our aim is to model how social attitudes color the multimodal behaviors of the agents. We have gathered a corpus of dyads that was annotated along two layers: social attitudes and nonverbal behaviors. By applying sequence mining methods we have extracted behavior patterns involved in the change of perception of an attitude. We are particularly interested in capturing the behaviors that correspond to a change of perception of an attitude. In this talk I will present the GRETA/VIB platform where our research is implemented.

#3 Re-Inventing Speech — The Biological Way [PDF] [Copy] [Kimi1]

Author: Björn Lindblom

The mapping of the Speech Chain has so far been focused on the experimentally more accessible links — e.g., acoustics — whereas the brain’s activity during speaking and listening has understandably received less attention. That state of affairs is about to change now thanks to the new sophisticated tools offered by brain imaging technology. At present many key questions concerning human speech processes remain incompletely understood despite the significant research efforts of the past half century. As speech research goes neuro, we could do with some better answers. In this paper I will attempt to shed some light on some of the issues. I will do so by heeding the advice that Tinbergen1 once gave his fellow biologists on explaining behavior. I paraphrase: Nothing in biology makes sense unless you simultaneously look at it with the following questions at the back of your mind: How did it evolve? How is it acquired? How does it work here and now? Applying the Tinbergen strategy to speech I will, in broad strokes, trace a path from the small and fixed innate repertoires of non-human primates to the open-ended vocal systems that humans learn today. Such an agenda will admittedly identify serious gaps in our present knowledge but, importantly, it will also bring an overarching possibility: It will strongly suggest the feasibility of bypassing the traditional linguistic operational approach to speech units and replacing it by a first-principles account anchored in biology. I will argue that this is the road-map we need for a more profound understanding of the fundamental nature spoken language and for educational, medical and technological applications.