In the past filtering and browsing applications for speech audio have traditionally relied on human annotation (such as closed-captioning) to characterize the semantic content of the audio stream. Recent advances in the performance of automatic speech recognition (ASR) systems now make it possible to automatically derive semantic content from raw speech audio. However, current commercially available ASR systems were designed for dictation and they perform poorly in the domain of broadcast news, producing transcripts with many misrecognized words and omissions. Therefore, using the transcript alone to define story boundaries and to create semantic representations for each story becomes difficult.
This project uses recent text news stories to help compensate for the poor ASR performance. We collect and cluster electronic text news to help define story boundaries within the audio, and to help create a semantically-based vector representation of each story. We can then use a vector similarity measure to determine if two stories are semantically close to each other. The objective is to create a news personalization system that allows a user to make decisions based solely on the audio itself.
Synthetic News Radio web site.