Daniel McDuff, Rana el Kaliouby, Abdelrahman Nasser Mahmoud, Youssef Kashef, M. Ehsan Hoque, Matthew Goodwin and Rosalind W. Picard
People express and communicate their mental states—such as emotions, thoughts, and desires—through facial expressions, vocal nuances, gestures, and other non-verbal channels. We have developed a computational model that enables real-time analysis, tagging, and inference of cognitive-affective mental states from facial video. This framework combines bottom-up, vision-based processing of the face (e.g., a head nod or smile) with top-down predictions of mental-state models (e.g., interest and confusion) to interpret the meaning underlying head and facial signals over time. Our system tags facial expressions, head gestures, and affective-cognitive states at multiple spatial and temporal granularities in real time and offline, in both natural human-human and human-computer interaction contexts. A version of this system is being made available commercially by Media Lab spin-off Affectiva, indexing emotion from faces. Applications range from measuring people's experiences to a training tool for autism spectrum disorders and people who are nonverbal learning disabled.