We show that using thin slices (< 1 minute) of facial expression and body language data, we can train a deep neural network to predict whether two people in a conversation will bond with each other. Bonding is measured using the Bonding subscale of the Working Alliance Inventory. We show that participants who experience bonding perceive their conversational partner as interesting, charming, and friendly, and do not perceive them as distant or annoying.
The data are collected from a user study of naturalistic conversations, in which participants were asked to interact for 20 minutes, and were recorded using cameras, microphones, and Microsoft Kinects. To ensure participants did not become self-conscious of their non-verbal cues, they were told the purpose of the study was to train machine learning algorithms to read lips.
We show that not only can we accurately predict bonding from participants' personality, disposition, and traits, but that we can predict whether the participant will experience bonding up to 20 minutes later, using only one-minute thin slices of facial expression and body language data. This ability could be extremely useful to an intelligent virtual agent, because if it could detect at one-minute intervals whether it was bonding with its user, it could make course corrections to promote enjoyment and foster bonding. We provide an analysis of the facial expression and body language cues associated with higher bonding, and show how this information could be used by an agent to synthesize the appropriate non-verbal cues during conversation.