The goal of the Cognitive Machines group is to create systems that engage in fluid, situated, meaningful communication with human partners. We seek to understand and model the processes by which words are grounded in the physical world as a result of embodied perception, action, and learning. These models are applied to create situated human-machine interfaces. We also use our computational models as a source of predictions and possible accounts for a number of cognitive phenomena including aspects of children's language acquisition, concept formation, and attention.
Research Projects
10,000x More Efficient Computing
Joseph Bates, George Shaw and Deb RoyVaried important problems can be solved using surprisingly approximate arithmetic. We've designed a co-processor for such arithmetic that provides 100,000 cores on a single standard chip, or 1,000 cores in a sub-watt mobile device. We are exploring applications of such machines in image and video processing. Cost can be under a penny per core, and compared to CPUs, improvements in speed and energy use can exceed 10,000x.
BlitzScribe: Speech Analysis for the Human Speechome Project
Brandon Roy and Deb RoyBlitzScribe is a new approach to speech transcription driven by the demands of today's massive multimedia corpora. High-quality annotations are essential for indexing and analyzing many multimedia datasets; in particular, our study of language development for the Human Speechome Project depends on speech transcripts. Unfortunately, automatic speech transcription is inadequate for many natural speech recordings, and traditional approaches to manual transcription are extremely labor intensive and expensive. BlitzScribe uses a semi-automatic approach, combining human and machine effort to dramatically improve transcription speed. Automatic methods identify and segment speech in dense, multitrack audio recordings, allowing us to build streamlined user interfaces maximizing human productivity. The first version of BlitzScribe is already about 4-6 times faster than existing systems. We are exploring user-interface design, machine-learning and pattern-recognition techniques to build a human-machine collaborative system that will make massive transcription tasks feasible and affordable.
Crowdsourcing the Creation of Smart Role-Playing Agents
Jeff Orkin and Deb RoyWe are crowdsourcing the creation of socially rich interactive characters by collecting data from thousands of people interacting and conversing in online multiplayer games, and mining recorded gameplay to extract patterns in language and behavior. The tools and algorithms we are developing allow non-experts to automate characters who can play roles by interacting and conversing with humans, and with each other. The Restaurant Game recorded over 16,000 people playing the roles of customers and waitresses in a virtual restaurant. Improviso is recording humans playing the roles of actors on the set of a sci-fi movie. This approach will enable new forms of interaction for games, training simulations, customer service, and HR job applicant screening systems.
HouseFly: Immersive Video Browsing and Data Visualization
Philip DeCamp, Rony Kubat and Deb RoyHouseFly combines audio-video recordings from multiple cameras and microphones to generate an interactive, 3D reconstruction of recorded events. Developed for use with the longitudinal recordings collected by the Human Speechome Project, this software enables the user to move freely throughout a virtual model of a home and to play back events at any time or speed. In addition to audio and video, the project explores how different kinds of data may be visualized in a virtual space, including speech transcripts, person tracking data, and retail transactions.
Human Speechome Project
Philip DeCamp, Brandon Roy, Soroush Vosoughi and Deb RoyThe Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in over 200,000 hours of home video and speech recordings. The tools that are being developed for mining and learning from hundreds of terabytes of multimedia data offer the potential for breaking open new business opportunities for a broad range of industries—from security to Internet commerce.
Learning Language Using Virtual Game Context
Hilke Reckman, Jeff Orkin, Tynan Smith and Deb RoyThis project uses the gameplay data from The Restaurant Game and Improviso as linguistic corpora for automated language learning. These corpora are special because they include computer-interpretable non-linguistic context that contains cues as to what the players might mean with the words and sentences they utter. The results feed back into the original projects by contributing to the linguistic competence of the AI that is being developed for those games.
Real-Time Behavior Analysis
Matt MillerPeople are surprisingly predictable. We use real-time video analysis to extract patterns of behavior from crowds browsing demos in our lab space. We can discover meaningful locations and sequences just from observing how people interact in the space. We can even begin to predict what people might do next.
Speech Interaction Analysis for the Human Speechome Project
Brandon Roy and Deb RoyThe Speechome Corpus is the largest corpus of a single child learning language in a naturalistic setting. We have now transcribed significant amounts of the speech to support new kinds of language analysis. We are currently focusing on the child's lexical development, pinpointing "word births" and relating them to caregiver language use. Our initial results show child vocabulary growth at an unprecedented temporal resolution, as well as a detailed picture of other measures of linguistic development. The results suggest individual caregivers "tune" their spoken interactions to the child's linguistic ability with far more precision than expected, helping to scaffold language development. To perform these analyses, new tools have been developed for interactive data annotation and exploration.
Speechome Recorder for the Study of Child Development Disorders
Soroush Vosoughi, Joe Wood, Matthew Goodwin and Deb RoyCollection and analysis of dense, longitudinal observational data of child behavior in natural, ecologically valid, non-laboratory settings holds significant benefits for advancing the understanding of autism and other developmental disorders. We have developed the Speechome Recorder—a portable version of the embedded audio/video recording technology originally developed for the Human Speechome Project—to facilitate swift, cost-effective deployment in special-needs clinics and homes. Recording child behavior daily in these settings will enable us to study developmental trajectories of autistic children from infancy through early childhood, as well as atypical dynamics of social interaction as they evolve on a day-to-day basis. Its portability makes possible potentially large-scale comparative study of developmental milestones in both neurotypical and autistic children. Data-analysis tools developed in this research aim to reveal new insights toward early detection, provide more accurate assessments of context-specific behaviors for individualized treatment, and shed light on the enduring mysteries of autism.
Speechome Video for Retail Analysis
Rony Kubat, Philip DeCamp, Kenneth Jackowitz (BOA) and Deb RoyWe are adapting the video data collection and analysis technology derived from the Human Speechome Project for the retail sector through real-world deployments. We are developing strategies and tools for the analysis of dense, longitudinal video data to study behavior of and interaction between customers and employees in commercial retail settings. One key question in our study is how the architecture of a retail space affects customer activity and satisfaction, and what parameters in the design of a space are operant in this causal relationship.