We approach the problem of how machines, and humans, can learn words that describe actions. We put forth that such words are grounded not in the sensory-motor aspect of an action, but rather in the intentions of the person performing the action. We therefore pose the problem of action-word learning in two stages: intention recognition and linguistic mapping. The first of these stages is cast as a plan-recognition problem in which state-action sequences are parsed using a probabilistic online chart parser. The second stage casts mapping in a Bayesian framework, employing algorithms used in speech recognition and machine translation.