Grounding the Meaning of Verbs through Structured Motor Control Representations

The meanings of simple verbs such as "pick up" convey more information than meets the eye. In order to pick up an object, a person much reach towards it, touch it, grasp it, and then lift it. To know what "pick up" means is to know all of these things. From this kind of word meaning comes common-sense knowledge such as "you can't pick something up without touching it." Machines today lack this depth of understanding of verbs. In this project, we are exploring new ways to represent the semantics of verbs so that machines can process and understand sensory-grounded meanings of natural language in human-like ways. Our basic approach is to connect verbs to non-linguistic representations based on the sensory and motor systems of physical robots. Using Ripley, our 7-degree-of-freedom manipulator robot, we have designed a system that learns to recognize gestures and the relations between gestures. Data is recorded by allowing a human operator to move the compliant robot through motions such as "pick up" and "move toward." A Hidden Markov Model learning algorithm is then trained on the data generated by these gestures, which results in a structured sequential representation of each motion. Each sequential component can then be related to components of other gestures, thus enabling the system to acquire such relations as "[pick up] is composed of [move toward], [close gripper], and [retract]." We are thus able to derive relations between word meanings from relations of underlying sensory-motor structures. We believe that this kind of connection between language and non-linguistic knowledge is an essential step toward intelligent language processing and understanding by machines.