Cognitive Machines
How to build machines that learn to use language in human-like ways, and develop tools and models to better understand how children learn to communicate and how adults behave.
The goal of the Cognitive Machines group is to create systems that engage in fluid, situated, meaningful communication with human partners. We seek to understand and model the processes by which words are grounded in the physical world as a result of embodied perception, action, and learning. These models are applied to create situated human-machine interfaces. We also use our computational models as a source of predictions and possible accounts for a number of cognitive phenomena including aspects of children's language acquisition, concept formation, and attention.

Research Projects

  • BlitzScribe: Speech Transcription for the Human Speechome Project

    Brandon Roy and Deb Roy
    BlitzScribe is a new approach to speech transcription driven by the demands of today's massive multimedia corpora. High-quality annotations are essential for indexing and analyzing many multimedia datasets; in particular, our study of language development for the Human Speechome Project depends on speech transcripts. Unfortunately, automatic speech transcription is inadequate for many natural speech recordings, and traditional approaches to manual transcription are extremely labor intensive and expensive. BlitzScribe uses a semi-automatic approach, combining human and machine efforts to dramatically improve transcription speed. Automatic methods identify and segment speech in dense, multitrack audio recordings, allowing us to build streamlined user interfaces maximizing human productivity. The first version of BlitzScribe is already about 4-6 times faster than existing systems. We are exploring user-interface design, machine-learning and pattern-recognition techniques to build a human-machine collaborative system that will make massive transcription tasks feasible and affordable.
  • Crowdsourcing the Creation of Smart Role-Playing Agents

    Jeff Orkin and Deb Roy
    We are crowdsourcing the creation of socially rich, interactive characters by collecting data from thousands of people interacting and conversing in online multiplayer games, and mining recorded gameplay to extract patterns in language and behavior. The tools and algorithms we are developing allow non-experts to automate characters who can play roles by interacting and conversing with humans (via speech or typed text), and with each other. The Restaurant Game recorded over 16,000 people playing the roles of customers and waitresses in a virtual restaurant. Improviso is recording humans playing the roles of actors on the set of a sci-fi movie. This approach will enable new forms of interaction for games, training simulations, customer service, and HR job applicant screening systems.
  • HouseFly: Immersive Video Browsing and Data Visualization

    Philip DeCamp, Rony Kubat and Deb Roy
    HouseFly combines audio-video recordings from multiple cameras and microphones to generate an interactive, 3D reconstruction of recorded events. Developed for use with the longitudinal recordings collected by the Human Speechome Project, this software enables the user to move freely throughout a virtual model of a home and to play back events at any time or speed. In addition to audio and video, the project explores how different kinds of data may be visualized in a virtual space, including speech transcripts, person tracking data, and retail transactions.
  • Human Speechome Project

    Philip DeCamp, Brandon Roy, Soroush Vosoughi and Deb Roy
    The Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in over 200,000 hours of home video and speech recordings. The tools that are being developed for mining and learning from hundreds of terabytes of multimedia data offer the potential for breaking open new business opportunities for a broad range of industries—from security to Internet commerce.
  • Language, Word Learning, and the Activity Substrate of Everyday Life

    Brandon C. Roy, Matthew Miller, Michael C. Frank and Deb Roy

    Language is inextricably linked to the activities and events that make up our daily lives. For a child learning language, everyday activities provide an important context for learning first words. This work builds on the corpus collected for the Human Speechome Project, the largest multimodal corpus of one child's early life, to explore how experience with language ties to space, time, and daily activity to support word learning. We use manual and fully automatic methods, ranging from direct annotation to computer vision and unsupervised latent variable approaches, to identify the abstract "stuff of life" that makes up early experience. We show how a word's contextual grounding predicts when it will be learned.

  • Media Ecosystem Analysis: Lessons from the Boston Marathon Bombings

    Soroush Vosoughi and Deb Roy

    In this project we examine the social media and traditional media's response to the Boston Marathon bombings from the moment of the explosion to two weeks after the events, including the search, hunt, and capture of the suspects. We use big data analytics, natural language processing, and complex system and network analysis techniques. We focus specifically on information flow, engagement and attention of the audience, emergence of broadcasters, source and spread of rumors, and interplay of various media. We hope to develop a better understanding of the nature of information generation and flow from broadcasters and audiences across different media. Using this event as a case study, we can find out what went wrong or right, and come up with recommendations for different actors (news sources, social media participants, police departments) to better facilitate information flow and minimize misunderstanding and the spread of false information.

  • Rumors in Social Networks: Detection, Verification and Intervention

    Soroush Vosoughi and Deb Roy

    Motivated by the role that rumors played in the aftermath of the Boston Marathon bombings, we study the emergence, spread, and veracity of rumors in large, complex, and highly connected message passing systems such as social media platforms, with a particular focus on rumors surrounding emergencies. We are using the Boston Marathon bombings as a case study to develop computational models of rumors that can be used to predict the veracity, spread, and impact of rumors surrounding particular events. The end goal is to create an online rumor verification algorithm that can analyze rumors in real-time as events unfold. We hope our tool can be used by citizens, journalists, and emergency services to minimize the spread and impact of false information in social media during emergencies.

  • Speechome Recorder for the Study of Child Development Disorders

    Soroush Vosoughi, Joe Wood, Matthew Goodwin and Deb Roy
    Collection and analysis of longitudinal observational data of child behavior in natural, ecologically valid, non-laboratory settings holds significant benefits for advancing the understanding of autism and other developmental disorders. We developed the Speechome Recorder—a portable version of the embedded recording technology originally developed for the Human Speechome Project—to facilitate cost-effective deployment in special-needs clinics and homes. Recording child behavior daily in these settings will enable us to study developmental trajectories of autistic children from infancy through early childhood, as well as atypical dynamics of social interaction as they evolve on a day-to-day basis. Its portability makes possible potentially large-scale comparative study of developmental milestones in both neurotypical and autistic children. Data-analysis tools developed in this research aim to reveal new insights toward early detection, provide more accurate assessments of context-specific behaviors for individualized treatment, and shed light on autism.