Michael Fleischman, Deb Roy
Work for a Member company and need a Member Portal account? Register here with your company email address.
Jan. 1, 2008
Michael Fleischman, Deb Roy
Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpora of unlabeled video, and are applied to the task of automatic speech recognition of sports video. Results show that grounded language models improve perplexity and word error rate over text based language models, and further, support video information retrieval better than human generated speech transcriptions.