Project

Realtime Detection of Social Cues

Copyright

Jin Joo Lee

Jin Joo Lee

Realtime detection of social cues in children’s voices

In everyday conversation, people use what are known as backchannels to signal to someone that they are still listening, paying attention, and engaged. As listeners, we smile, nod, and say “uh-huh” to convey attentiveness, and we do this naturally with little thought. We give this feedback not randomly but at certain moments in the conversation because speakers give off social cues that signal upcoming backchanneling opportunities.

Copyright

Mirko Gelsomini

A robot listener will need to detect for these social cues to carefully time its responses. We developed a realtime rule-based model that detects for these cues based on the prosody of the speaker’s voice. From low-level speech features, the model detects for significant changes in pitch, energy shifts, long pauses, and long utterances. Its model parameters were trained and tested on a dataset of children’s voices. We then used this model to trigger contingent behaviors of a listening robot, and children were highly engaged with the robots as they told them stories about their day.