A Bayesian Theory of Mind Approach to Nonverbal Communication for Human-Robot Interactions
Associate Professor of Media Arts and Sciences, Massachusetts Institute of Technology
Professor of Psychology, Northeastern University
Professor of Computer Science, University of Southern California
Much of human social communication is channeled through our facial expressions, body language, gazing, and many other nonverbal behaviors. A robot's ability to express and recognize the emotional states of people through these nonverbal channels is at the core of artificial social intelligence. The purpose of this thesis is to define a computational framework to nonverbal communication for human-robot interaction. We address both sides to nonverbal communication, i.e. the decoding and encoding of social-emotional states through nonverbal behaviors, and also demonstrate their shared underlying representations.
We ground our computational framework in storytelling interactions. Storytelling is an interaction form that is mutually regulated between speakers and listeners where a key dynamic is the back-and-forth process of speaker cues and listener responses. Listeners convey attentiveness and engagement through listener responses, also called backchannel feedback, while storytellers use speaker cues, also called backchannel-inviting cues, to elicit those feedback responses.
We demonstrate that storytellers' employ plans, albeit short, to influence and infer the attentive state of listeners using these speaker cues. We computationally model the intentional inference of storytellers as a POMDP planning problem of getting listeners to pay attention while also inferring their hidden state. We demonstrate the increased gains in inference accuracy when accounting for this intentional context compared to alternative state estimators that consider the intrapersonal or interpersonal context.
By formulating emotion recognition as a planning problem, we apply recent probabilistic AI methods to inverting models of planning to perform belief inference. We computationally model emotion expression as a combined process of estimating others' beliefs through inference inversion and then producing nonverbal expressions to affect those beliefs. This enables a listening robot to communicate an attentive state by first tracking the storyteller's beliefs about the agent from their employment of speaker cues. Then by producing appropriate listener responses, the agent can manipulate those beliefs toward a desired perception of an attentive listener. We demonstrate that a robotic agent operating under this paradigm more effectively communicates an attentive state compared to current approaches that cannot dynamically capture how its behavioral expressions are interpreted by their human partners.