Towards Surveillance Video Search by Natural Language Query

Stefanie Tellex, Deb Roy


Spatial language video retrieval is an important real-world problem that is also a natural test bed for evaluating semantic structures for natural language descriptions of motion on naturalistic data. This paper describes first steps towards a system that grounds the meaning of spatial prepositions in geometric features. This system can be used to search a corpus of surveillance video for clips that match spatial language queries such as “along the hallway” and “across the kitchen.” We present experiments characterizing the performance of models for the prepositions “across” and “along,” and present a methodology for modeling other spatial prepositions.

