Work for a Member organization and need a Member Portal account? Register here with your official email address.

Publication

Integration of Speech and Vision using Mutual Information

June 5, 2000

People

Deb Roy

Professor of Media Arts and Sciences

Share this publication

Deb Roy

Abstract

We are developing a system which learns words from co-occurring spoken and visual input. The goal is to automatically segment continuous speech at word boundaries without a lexicon, and to form visual categories which correspond to spoken words. Mutual information is used to integrate acoustic and visual distance metrics in order to extract an audio-visual lexicon from raw input. We report results of experiments with a corpus of infant-directed speech and images.

icassp2000.pdf

Integration of Speech and Vision using Mutual Information

People

Abstract

Spontaneous Speech Recognition Using Visual Context-Aware Language Models

Parameter Search in an Agent-Based Model of Pedestrian Movement in Retail Environments

Big problems, big data solutions

A Matter of Facts podcast: MIT's Deb Roy on Twitter

Integration of Speech and Vision using Mutual Information

People

Share this publication

Abstract

Spontaneous Speech Recognition Using Visual Context-Aware Language Models

Parameter Search in an Agent-Based Model of Pedestrian Movement in Retail Environments

Big problems, big data solutions

A Matter of Facts podcast: MIT's Deb Roy on Twitter