Multi-Party Human-Robot Conversation Interactions


With the emergence of social robots in people’s daily lives, their interactions with people need to consider not only engaging with individuals, but also with a group of people (e.g. families or colleagues). However, most human-robot interaction work has focused on interacting with a single user at a time, due to challenges in understanding social cues from multiple people at the same time as well as designing interactions with a group of people. Designing for group interactions is vastly different from designing interaction with individuals. Because in a group interaction users are also interacting with each other and not just with the robot, understanding engagement cues require deeper contextualized interpretation, for example. Not only that, conversing with multiple users at the same time requires both significant advancements in sensor technology and dialogue systems. Conversation is central to interactions including in a museum guide setting that this project is targeting. In this project, we aim to design contextualized and personalized conversation experience between a small group of users (2–6 people) and a museum tour-guide robot. 

The focus of the work is not in conversing with multiple users at the same time. Rather, we propose understanding and tracking individual engagement in a group setting and using this information to understand and strategize timing for an agent to initiate a conversation with a user. When selecting a user to converse with, the agent learns the most effective policy, i.e., optimizes to choose a user whom by increasing the selected user’s engagement best improves others and the overall group engagement. The dialogue experience between the user and the agent will be supported by MRF-Chat, a novel probabilistic graphical model using Markov Random Fields that improves the prediction accuracy of existing deep learning methods by making assumptions grounded in modeling of mutual knowledge between the user and the agent.

The aforementioned goals are divided into the following approaches: (1) contextual multimodal verbal and nonverbal sentiment and engagement detection, (2) personalized dialogue understanding and generation based on mutual knowledge modeling, and (3) optimized interaction policy for conversation partner selection to improve group engagement. This final report states the methodology and the findings from this project. We report on experiments on modeling of sentiment and engagement behaviors using verbal and nonverbal temporal cues, where we achieve up to 94% balanced accuracy on the testing set. 

We present MRF-Chat approaches and evaluation results that have been published in the proceedings of EMNLP-2021. MRF-chat significantly boosts KV Memory and Poly Encoder performances. Even though our results seem promising, we identified few additional approaches to improve the results for generalization and real-time prototyping in the remaining project timeline.