Cognitive Machines
How to build machines that learn to use language in human-like ways, and develop tools and models to better understand how children learn to communicate and how adults behave.
The goal of the Cognitive Machines group is to create systems that engage in fluid, situated, meaningful communication with human partners. We seek to understand and model the processes by which words are grounded in the physical world as a result of embodied perception, action, and learning. These models are applied to create situated human-machine interfaces. We also use our computational models as a source of predictions and possible accounts for a number of cognitive phenomena including aspects of children's language acquisition, concept formation, and attention.

Research Projects

  • BlitzScribe: Speech Analysis for the Human Speechome Project

    Brandon Roy and Deb Roy
    BlitzScribe is a new approach to speech transcription driven by the demands of today's massive multimedia corpora. High-quality annotations are essential for indexing and analyzing many multimedia datasets; in particular, our study of language development for the Human Speechome Project depends on speech transcripts. Unfortunately, automatic speech transcription is inadequate for many natural speech recordings, and traditional approaches to manual transcription are extremely labor intensive and expensive. BlitzScribe uses a semi-automatic approach, combining human and machine efforts to dramatically improve transcription speed. Automatic methods identify and segment speech in dense, multitrack audio recordings, allowing us to build streamlined user interfaces maximizing human productivity. The first version of BlitzScribe is already about 4-6 times faster than existing systems. We are exploring user-interface design, machine-learning and pattern-recognition techniques to build a human-machine collaborative system that will make massive transcription tasks feasible and affordable.
  • Crowdsourcing the Creation of Smart Role-Playing Agents

    Jeff Orkin and Deb Roy
    We are crowdsourcing the creation of socially rich, interactive characters by collecting data from thousands of people interacting and conversing in online multiplayer games, and mining recorded gameplay to extract patterns in language and behavior. The tools and algorithms we are developing allow non-experts to automate characters who can play roles by interacting and conversing with humans (via speech or typed text), and with each other. The Restaurant Game recorded over 16,000 people playing the roles of customers and waitresses in a virtual restaurant. Improviso is recording humans playing the roles of actors on the set of a sci-fi movie. This approach will enable new forms of interaction for games, training simulations, customer service, and HR job applicant screening systems.
  • HouseFly: Immersive Video Browsing and Data Visualization

    Philip DeCamp, Rony Kubat and Deb Roy
    HouseFly combines audio-video recordings from multiple cameras and microphones to generate an interactive, 3D reconstruction of recorded events. Developed for use with the longitudinal recordings collected by the Human Speechome Project, this software enables the user to move freely throughout a virtual model of a home and to play back events at any time or speed. In addition to audio and video, the project explores how different kinds of data may be visualized in a virtual space, including speech transcripts, person tracking data, and retail transactions.
  • Human Speechome Project

    Philip DeCamp, Brandon Roy, Soroush Vosoughi and Deb Roy
    The Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in over 200,000 hours of home video and speech recordings. The tools that are being developed for mining and learning from hundreds of terabytes of multimedia data offer the potential for breaking open new business opportunities for a broad range of industries—from security to Internet commerce.
  • Speech Interaction Analysis for the Human Speechome Project

    Brandon Roy and Deb Roy

    The Speechome Corpus is the largest corpus of a single child learning language in a naturalistic setting. We have now transcribed significant amounts of the speech to support new kinds of language analysis. We are currently focusing on the child's lexical development, pinpointing "word births" and relating them to caregiver language use. Our initial results show child vocabulary growth at an unprecedented temporal resolution, as well as a detailed picture of other measures of linguistic development. The results suggest individual caregivers "tune" their spoken interactions to the child's linguistic ability with far more precision than expected, helping to scaffold language development. To perform these analyses, new tools have been developed for interactive data annotation and exploration.

  • Speechome Recorder for the Study of Child Development Disorders

    Soroush Vosoughi, Joe Wood, Matthew Goodwin and Deb Roy
    Collection and analysis of dense, longitudinal observational data of child behavior in natural, ecologically valid, non-laboratory settings holds significant benefits for advancing the understanding of autism and other developmental disorders. We have developed the Speechome Recorder—a portable version of the embedded audio/video recording technology originally developed for the Human Speechome Project—to facilitate swift, cost-effective deployment in special-needs clinics and homes. Recording child behavior daily in these settings will enable us to study developmental trajectories of autistic children from infancy through early childhood, as well as atypical dynamics of social interaction as they evolve on a day-to-day basis. Its portability makes possible potentially large-scale comparative study of developmental milestones in both neurotypical and autistic children. Data-analysis tools developed in this research aim to reveal new insights toward early detection, provide more accurate assessments of context-specific behaviors for individualized treatment, and shed light on the enduring mysteries of autism.