Acquiring Verbs from Speech and Video

We are interested in building systems which can learn to connect natural language to objects and actions in video. In our current experiment, we have created videos of a person moving objects around on a table top. Naive subjects have been asked to describe verbally the video sequences. Our goal is to create a learning system which is able to "ground" the meaning of words and phrases in terms of observations found in the video. Applications of this work include verbal control of robots and natural-language-based access to video archives.