- Overview
- Publications
- Current Projects List
- Sample Research Projects
- Consortia/Joint Programs
- Research Groups
Affective Computing
Ambient Intelligence
Biomechatronics
Camera Culture
Changing Places
Cognitive Machines
Computing Culture
Context-Aware Computing
Ecology Media
eRationality
Human Dynamics
Lifelong Kindergarten
Media Fabrics
Molecular Machines
Music, Mind and Machine
Neuroengineering and Neuromedia
New Media Medicine
Object-Based Media
Opera of the Future
Personal Robots
Physical Language Workshop
Responsive Environments
Smart Cities
Sociable Media
Society of Mind
Software Agents
Speech + Mobility
Tangible Media
Viral Communications
Research Group Projects and Descriptions
|
Cognitive Machines
Principal Investigator: Deb Roy The goal of the Cognitive Machines group is to create systems that engage in fluid, situated, meaningful communication with human partners. We seek to understand and model the processes by which words are grounded in the physical world as a result of embodied perception, action, and learning. These models are applied to create situated human-machine interfaces. We also use our computational models as a source of predictions and possible accounts for a number of cognitive phenomena including aspects of children's language acquisition, concept formation, and attention. |
|
| Behavior Capture from Thousands of People Online |
Jeff Orkin and Deb Roy
The Restaurant Game is a multiplayer simulation that captures the behavior and language of thousands of people playing the roles of waitresses and customers. We are developing machine-learning algorithms that mine game-play logs to acquire generative models of human language, behavior, and social roles. These models will power synthetic conversational characters that interact with humans in training simulations, games, and other virtual worlds. |
| BlitzScribe: Speech Analysis for the Human Speechome Project |
Deb Roy, Jethran Guinness and Brandon Roy
BlitzScribe is a new approach to speech transcription driven by the demands of today's massive multimedia corpora. High-quality annotations are essential for indexing and analyzing many multimedia datasets; in particular, our study of language development for the Human Speechome Project depends on speech transcripts. Unfortunately, automatic speech transcription is inadequate for many natural speech recordings, and traditional approaches to manual transcription are extremely labor intensive and expensive. BlitzScribe uses a semi-automatic approach, combining human and machine effort to dramatically improve transcription speed. Automatic methods identify and segment speech in dense, multitrack audio recordings, allowing us to build streamlined user interfaces maximizing human productivity. The first version of BlitzScribe is already about 4-6 times faster than existing systems. We are exploring user-interface design, machine-learning and pattern-recognition techniques to build a human-machine collaborative system that will make massive transcription tasks feasible and affordable.
|
| HeadLock: Video Analysis for the Human Speechome Project |
Philip DeCamp and Deb Roy
HeadLock is a semi-automated system for head pose annotation that explores how human-computer interfaces can be combined with computer vision technologies to efficiently extract behavioral information from video recordings. For images with limited resolution, the orientation of a head is often the best approximation for gaze direction, a crucial component to analyzing the rich interactions and behaviors of humans. The goal of HeadLock is to reduce the cost of extracting head pose from video by several orders of magnitude by developing machine-perception technologies that can perform robust head pose estimation with minimal constraints on resolution and camera angle.
|
| Human Speechome Project |
Deb Roy, Philip DeCamp, Brandon Roy, Jethran Guinness, Rony Kubat and Stefanie Tellex
The Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in over 400,000 hours of home video and speech recordings. The tools that are being developed for mining and learning from thousands of terabytes of multimedia data offer the potential for breaking open new business opportunities for a broad range of industries—from security to Internet commerce.
|
| SLIMD |
Deb Roy, Stefanie Tellex, Kleovoulos Tsourides and Gregory Marton
SLIMD is a prototype system for retrieving video clips via natural language queries. An automated video analysis system tracks the movements of people in surveillance video and stores track data in a database. A natural language interpretation system converts queries such as "to the kitchen counter" into semantic path filters, which are used to find tracks in the database that match the description. The system can be used to search large video archives for specific human behaviors and can be adapted to other forms of geospatial data such as GPS logs.
|
| Sports Video Search Using Situated Natural Language Processing |
Michael Fleischman and Deb Roy
The quantity and availability of video content is soaring due to the combination of television networks and the Internet. The aim of this project is to develop more effective means to manage, search, and translate video content. We are developing algorithms that interpret language in video (speech and closed caption text) by exploiting aspects of the non-linguistic context, or situation, conveyed by the accompanying video. We model situations by automatically finding patterns within low-level audio/video features that represent events. Event patterns are then mapped to words spoken in the video in order to create a “grounded” dictionary of word meanings. Our research focuses on sports video, in particular, on Major League Baseball games. We are exploring applications in multimedia search and video-based machine translation.
|
| Trisk: A Conversational Robot |
Deb Roy, Kai-yuh Hsiao, Stefanie Tellex, Rony Daniel Kubat, Soroush Vosoughi, Thananat Jitapunkul and Kleovoulos Tsourides
Trisk is a humanoid robot that integrates speech input, visual perception, and active touch in order to interact with humans and its environment. It can understand and obey natural language commands, and will soon be able to answer questions. The robot is a platform for designing new algorithms and multimodal knowledge representations for sensory-motor grounded language use. This research takes steps towards social robots that can coordinate activities with human partners using natural language and gesture. |
|
MIT Media Laboratory Home Page | Research Main Index |
|
about