Understanding Situated Language in Computer Games

Speech and language do not occur in a vacuum - much of the meaning of an utterance comes from its context, including when and where it was uttered, what the person saying it was doing at the time, who was there to listen to it, and why the speaker decided to speak in the first place. Modern computer games provide a platform for capturing some of this context without requiring solutions for difficult sensory and action problems such as those posed by our work with robots. We have been concentrating on context provided by players' shared plans and abilities, such those that need to be taken into account when understanding an utterance like "Can you help me with this". This work has resulted in several systems and characters that understand situated speech and language, as well as a new theory of concept based on perceived affordances.

Sample Videos

These videos exemplify some of the goals of our work. While they use the technology presented in the papers below, they are examples constructed for demonstration purposes and do not directly stem from the data we gained from studies.

This video shows a sample disambiguation of situated speech, where the same utterance is interpreted differently based on context: "Attack this barrel" vs. "Attack this bear".

This video shows three different interpretations of the utterance "Can you help me with this" based on recognizing the player's plans.

Affordance-Based Communicative Characters

We are currently transferring the theories and frameworks we have developed for understanding situated speech in computer games to a new platform (the Torque engine) and are constructing an autonomous character that understands as well as produces situated language based on the current state of the game environment and the plans it is jointly engaged in with the player.

Peter Gorniak, Jeff Orkin, Deb Roy

The Affordance-Based Concept Theory

The Affordance-Based Concept (ABC) theory casts concepts as sets of possible interactions an agent perceives in its current situation. The corresponding implementation employs a probabilistic hierarchical plan recognizer to produce such affordances and understands language by filtering the set of all affordances down to those applicable due to the words spoken. Employed to understand commands spontaneously produced by two player playing the game Neverwinter Nights, this implementation correctly predicts the listener's next action in most cases.

Related papers:

Peter Gorniak. (2005) The Affordance-Based Concept. Ph.D. Thesis. pdf (5.8M)

Combining Recognition of Speech, Plans and Referents

We have developed a framework that coherently integrates a rich representation of ambiguous speech with recognized hierarchical plans and simple linguistic referents. Referent resolution in the framework thus takes into account the speaker's speech as well as the speaker's and listener's joint intentions to disambiguate situated utterances such as "Can you pull that again?" in a situation with multiple levers that can be pulled.

Related papers:

Peter Gorniak and Deb Roy (2005). Probabilistic Grounding of Situated Speech using Plan Recognition and Reference Resolution. Seventh International Conference on Multimodal Interfaces (ICMI 2005). Best Paper Award. pdf (312K)

Peter Gorniak and Deb Roy (2005). Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games. Proceedings of Artificial Intelligence and Interactive Digital Entertainment, 2005. pdf (624K)