Human Speechome Project
Deb Roy, Philip DeCamp, Jethran Guinness, Rony Kubat, Brandon Roy, Alexia Salata, Michael Fleischman, Stefanie Tellex

The Human Speechome Project is an effort to observe and computationally model the longitudinal language development of a single child at an unprecedented scale. To achieve this, we are recording, storing, visualizing, and analyzing communication and behavior patterns in several hundred thousand hours of home video and speech recordings.  The tools that are being developed for mining and learning from hundreds of terabytes of multimedia data offer potential for breaking open new opportunities for a broad range of areas - from security to personal memory augmentation. This project is supported in part through donations of hardware and other resources by Zetera Corporation, Bell Microproducts, Seagate Technology, and funding from the National Science Foundation.

Press archive

 

Situated Natural Language Processing for Sports Video
Michael Fleischman and Deb Roy

The quantity and availability of video content is soaring due to the combination of television networks and the Internet. The aim of this project is to develop more effective means to manage, search, and translate video content. We are developing algorithms that interpret language in video (speech and closed caption text) by exploiting aspects of the non-linguistic context, or situation, conveyed by the accompanying video. We model situations by automatically finding patterns within low-level audio/video features that represent events. Event patterns are then mapped to words spoken in the video in order to create a "grounded" dictionary of word meanings. Our research focuses on sports video, in particular, on Major League Baseball games. We are exploring applications in multimedia search and video-based machine translation.

 

Trisk: A Conversational Robot
Kai-yuh Hsiao, Stefanie Tellex, Rony Daniel Kubat, Soroush Vosoughi, Thananat Jitapunkul, Kleovoulos Tsourides, Deb Roy

Trisk is a humanoid robot that integrates speech input, visual perception, and active touch in order to interact with humans and its environment. It can understand and obey natural language commands, and will soon be able to answer questions. The robot is a platform for designing new algorithms and multimodal knowledge representations for sensory-motor grounded language use. This research takes steps towards social robots that can coordinate activities with human partners using natural language and gesture.

 

trisk

Behavior Capture from Thousands of People Online
Jeff Orkin, Deb Roy

The Restaurant Game is a research project that will algorithmically combine the gameplay experiences of thousands of players to create a new game. We will apply machine learning algorithms to data collected through the multiplayer Restaurant Game, and produce a new single-player game that we will enter into the 2008 Independent Games Festival. Everyone who plays The Restaurant Game will be credited as a Game Designer.

 

 

 

maptask
Understanding Navigational Instructions (2005) Affordance-Based Language Understanding in Video Games (2005)
Ripley
Ripley: A Conversational Robot (2003-6) Seeing Meaning (2003-5)
cup
Learning Words for Actions (2005) Object Recognition (2005)
Hermes: Homeostatic Control for a Conversational Mobile Robot (2005) Representing Affordances of Objects Through Active Touch (2005)
 
Elvis: Situated conversational interface for a lighting system (2004) Attention glasses (2004)
BISHOP|BLENDER: Spatially Grounded Language Understanding in 3D Modeling Software (2003/2004) Physical Situation-Aware Language Generation
Bishop: Understanding complex spatial language (2003/2004) TLC: Semantic representations for language translation
Fuse: Semantically primed speech recognition using vision (2003) See and Tell: Translating visual scenes to verbal descriptions (2002)
Describer: Trainable scene description (2001) Newt: Visually-grounded language understanding (2002)
Learning spatial semantics (2001) Doodle: Visual natural language (2002)
Assistive communication aids (2002) JFig: Adaptive multimodal interface (2001)
Toco: Visually-grounded word learning (1999)