Embodied Conversation: Integrating Face and Gesture into Automatic Spoken Dialogue Systems

Justine Cassell
MIT Media Laboratory

In this chapter I’m going to discuss the issues that arise when we design automatic spoken dialogue systems that can use not only voice, but also facial and head movements and hand gestures to communicate with humans. For the most part I will concentrate on the generation side of the problem—that is, building systems that can speak, move their faces and heads and make hand gestures. As with most aspects of spoken dialogue, however, generation is no good without comprehension, and so I will also briefly discuss some of the issues involved in building systems that understand non-verbal communicative behaviors.

Because most researchers in the field of spoken dialogue may not be familiar with the literature on how speech and non-verbal behaviors are integrated in humans, the form of this chapter will be as follows: first I will describe how gesture and facial/head movements are used by humans and how these non-verbal behaviors are integrated into conversation among humans. Next I will turn to some of the issues that arise when we attempt to use the information available to us about natural human communication in the design of embodied dialogue systems. I will describe these issues in the context of a number of prototype systems that my students and I have built relying on these principles. Finally, I will discuss the evaluation of these systems, and whether non-verbal behaviors add anything to dialogue systems.