Theory and Practice in Discourse and Dialogue for Interactive Systems
 
leftside_space.gif (66 bytes)
[ Summary ] [ Requirements ] [ Schedule ] [ Assignments ] [ Bibliography ] [ Resources ]
[ 1 ] [ 2 ] [ 3 ] [ 4 ]
Assignment 3
Due: October 19
Theme: Generating intonation and/or gesture from information structure

Background
The purpose of this assignment is to provide you with the tools (both analytical and computational) for questioning and developing an understanding of the role of prosody and gesture in spoken language. The principle task is to annotate discourse samples with the types of discourse information categories we discussed in class (given/new, theme/rheme, etc.), map those annotations onto prosodic or gestural annotations, and use them to produce contextually appropriate synthesized speech (using FlexTalk) or animation (using the REA animator).

Note that in this assignment you can choose whether you work on intonation or gesture.   You are of course free to try both, but we only expect you to hand in results from one of the two options. As before, you are welcome (even encouraged) to work in pairs. If you have access to facial animation software, you may also try producing appropriate facial display as a third option.

Data
 Use the following famous American quotations as your data samples (you may also want to try out some sentences collected for Assignment 1):

Ask not what your country can do for you; ask what you can do for your country.
John F. Kennedy, Inaugural Address

Error of opinion may be tolerated when reason is left free to combat it.
Thomas Jefferson, First Inaugural Address

They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.
Benjamin Franklin, Historical Review of Pennsylvania 1759

... that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, for the people, shall not perish from the earth.
Abraham Lincoln, Gettysburg Address 1863

For the gesture version, use 4 sentences from the data you collected for Assignment 1.

Option 1: Intonation
Remember that the relationship between prosody and the types of discourse information we have discussed is far from straightforward. Determining prosody from text and meaning is currently an active area of computational linguistics research. So, don't expect to come up with the one right answer. In fact, one purpose of the assignment is to demonstrate that annotations in terms of different theoretical distinctions produce different intonational contours. 

Running FlexTalk
Some familiarity with the Pierrehumbert system of intonational notation will be necessary to do this assignment. Refer to Pierrehumbert and Hirschberg 1990 ("The Meaning of Intonational Contours in the Interpretation of Discourse") for a brief introduction to the system. You will also need to refer to the excerpt from the FlexTalk help file passed out in class.

To run FlexTalk, you have to have access to a Windows machine with Watson2.1 installed on it (most of the PCs in the Pond should have this installed). Launch the SpeakPad application from the Programs/Watson2.1/ folder on the task bar. You should make sure you select the "Keith Bell" voice (under the Speech/Select Mode menu).  Simply open your text file (or create it) in SpeakPad and press the play button to hear it. Note that FlexTalk will try to add its own intonation to your text, so as an initial pass you should precede all words with the "\!-" tag to de-accent them. 

All of the intonation commands you need can be found in this help file.

Now follow this procedure:

     
  • Without trying to adjust prosody or pronunciation, run each of the quotations above through the FlexTalk synthesizer. 
  • Analyze each of these quotations in terms of the given/new and theme/rheme distinctions. 
  • Some computational approaches to determining intonational parameters work by applying the following technique: 
    1. Assign accents to open-class words (nouns, verbs, adjectives, adverbs). 
    2. De-accent closed-class words (prepositions, determiners, etc.) 
    3. Remove accents from any words that occurred previously in the discourse.
    Using this approach (or your own consistent variant) and your result from the previous step, assign accents to each quotation and create the corresponding input file for FlexTalk. (For the sake of simplicity, you may want to start with the assumption that all types of stress are realized by Pierrehumbert's H* accent.) You'll need to consult chapter 4 of the FlexTalk reference manual to specify intonation in the FlexTalk input.
  • Run the annotated example from the previous step through FlexTalk. Based on the results, revise your intonational annotations and try again. You may need to do this several times and you may need to add information concerning phrasing. You should turn in both your original annotations and your final version. 
  • Read the quotations aloud several times and compare your intonation with your best FlexTalk output. ( Non-native speakers of English should work together with native speakers to get the prosody right.) How close did FlexTalk come to "getting it right?" Discuss your results, specifically addressing the following questions. 
Option 2: Gesture
 This part follows the same procedure as Option 1, except you will be producing an animation to go along with the spoken utterances.  In this case you will not be able to
 affect the intonation.  You will accomplish the animation using the Animator module of REA, our group's Conversational Humanoids. 

Running the Animator
The Animator (PantomimeServer) runs as a TCP/IP server on the "polong" computer in the puddle (E15-314) and an associated Text-To-Speech program runs on the "boa" computer in the same room (see Tim for access to the puddle). You will be writing scripts to send to the animator and running a TCP/IP client on boa. An example script is:
 

(tell :content (tell :recipient "REA" :content 
    (script :id 1 :content [
        (step :starttime 0.0 :content (eyebrows))
        (step :starttime 0.0 :content (rbeat :msg "start"))
        (step :starttime 0.2 :content (speak :id 1 :text "Hey baby, what's shaking?"))
        (step :event "word_3" :content (rbeat))
      ])))
This script has four commands in it, executed in order. The first one causes REA to raise her eyebrows immediately, then start the right hand beat driver. At 0.2 seconds after the start of the script REA starts to utter "Hey baby, what's shaking?" (interface with the TTS and lip-sync are taken care of for you). As REA utters the third word ("what's"), she will execute a beat gesture with her right arm. A complete list of animator commands can be found here.  The current list of the keyframes available can be found here, and the current list of keyframemotions can be found here.  (While it is possible to add to the list of keyframes and keyframemotions, you probably shouldn't need to do this for the lab.)

To execute a script from boa, open a DOS window and type:
 

java AMRun polong 5678 <scriptfilename>
Where, <scriptfilename> is the name of the text file containing the animator command you want executed.

For assigning gestures to the sample data, follow the exact same procedure as described in Option 1 of this assignment, using the same method, for locating where gestures are likely to occur, as was described for locating accents.  However, you will need to experiment with the type of gesture you produce.

Option 3: Facial Display
If you have access to facial animation software, you can try producing appropriate facial display to accompany the four sentences above. You should follow the observations in (Chovil, 1991) to see how well these work for production.

Questions
How well did the given/new and theme/rheme distinctions predict the placements of accents or gestures? Is it enough to analyze only at the level of discourse entities? What other phenomena seem to contribute to the appropriate intonational or gestural pattern? For intonation, what are the similarities between the Kennedy and Lincoln quotes? What sort of model might predict the patterns used by Kennedy and Lincoln? In what ways might the monologic discourse in these examples differ intonationally from dialogue?  For your gesture examples, what rules did you use to decide on type of gesture? What sort of difficulties might you encounter if you tried to write an algorithm to automatically annotate text with the appropriate intonational or gestural markings? 

What to turn in
As well as turning in the discussion on the questions above, you should also turn in the files you used to generate the output, both hardcopy and the files themselves (as email  attachments).