Due: October 19
Theme: Generating intonation and/or gesture from information structure
Background
The purpose of this assignment is to provide you with the tools (both
analytical and computational) for questioning and developing an understanding
of the role of prosody and gesture in spoken language. The principle task
is to annotate discourse samples with the types of discourse information
categories we discussed in class (given/new, theme/rheme, etc.), map those
annotations onto prosodic or gestural annotations, and use them to produce
contextually appropriate synthesized speech (using FlexTalk) or animation
(using the REA animator).
Note that in this assignment you can choose whether you work on intonation
or gesture. You are of course free to try both, but we only
expect you to hand in results from one of the two options. As before, you
are welcome (even encouraged) to work in pairs. If you have access to facial
animation software, you may also try producing appropriate facial display
as a third option.
Data
Use the following famous American quotations as your data samples
(you may also want to try out some sentences collected for Assignment 1):
Ask not what your country can do for you; ask
what you can do for your country.
John F. Kennedy, Inaugural Address
Error of opinion may be tolerated when reason is left
free to combat it.
Thomas Jefferson, First Inaugural Address
They that can give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety.
Benjamin Franklin, Historical Review of Pennsylvania
1759
... that this nation, under God, shall have a new birth
of freedom, and that government of the people, by the people, for the people,
shall not perish from the earth.
Abraham Lincoln, Gettysburg Address 1863
For the gesture version, use 4 sentences from the data you collected for
Assignment 1.
Option 1: Intonation
Remember that the relationship between prosody and the types of discourse
information we have discussed is far from straightforward. Determining
prosody from text and meaning is currently an active area of computational
linguistics research. So, don't expect to come up with the one right
answer. In fact, one purpose of the assignment is to demonstrate that annotations
in terms of different theoretical distinctions produce different intonational
contours.
Running FlexTalk
Some familiarity with the Pierrehumbert system of intonational notation
will be necessary to do this assignment. Refer to Pierrehumbert and Hirschberg
1990 ("The Meaning of Intonational Contours in the Interpretation of Discourse")
for a brief introduction to the system. You will also need to refer to
the excerpt from the FlexTalk help file passed out in class.
To run FlexTalk, you have to have access to a Windows machine with Watson2.1
installed on it (most of the PCs in the Pond should have this installed).
Launch the SpeakPad application from the Programs/Watson2.1/ folder on
the task bar. You should make sure you select the "Keith Bell" voice (under
the Speech/Select Mode menu). Simply open your text file (or create
it) in SpeakPad and press the play button to hear it. Note that FlexTalk
will try to add its own intonation to your text, so as an initial pass
you should precede all words with the "\!-" tag to de-accent them.
All of the intonation commands you need can be found in this help
file.
Now follow this procedure:
-
Without trying to adjust prosody or pronunciation, run each of the quotations
above through the FlexTalk synthesizer.
-
Analyze each of these quotations in terms of the given/new and theme/rheme
distinctions.
-
Some computational approaches to determining intonational parameters work
by applying the following technique:
-
Assign accents to open-class words (nouns, verbs, adjectives, adverbs).
-
De-accent closed-class words (prepositions, determiners, etc.)
-
Remove accents from any words that occurred previously in the discourse.
Using this approach (or your own consistent variant) and your result from
the previous step, assign accents to each quotation and create the corresponding
input file for FlexTalk. (For the sake of simplicity, you may want to start
with the assumption that all types of stress are realized by Pierrehumbert's
H* accent.) You'll need to consult chapter 4 of the FlexTalk reference
manual to specify intonation in the FlexTalk input.
-
Run the annotated example from the previous step through FlexTalk. Based
on the results, revise your intonational annotations and try again. You
may need to do this several times and you may need to add information concerning
phrasing. You should turn in both your original annotations and your final
version.
-
Read the quotations aloud several times and compare your intonation with
your best FlexTalk output. ( Non-native speakers of English should work
together with native speakers to get the prosody right.) How close
did FlexTalk come to "getting it right?" Discuss your results, specifically
addressing the following questions.
Option 2: Gesture
This part follows the same procedure as Option 1, except you
will be producing an animation to go along with the spoken utterances.
In this case you will not be able to
affect the intonation. You will accomplish the animation
using the Animator module of REA, our group's Conversational Humanoids.
Running the Animator
The Animator (PantomimeServer) runs as a TCP/IP server on the "polong"
computer in the puddle (E15-314) and an associated Text-To-Speech program
runs on the "boa" computer in the same room (see Tim for access to the
puddle). You will be writing scripts to send to the animator and running
a TCP/IP client on boa. An example script is:
(tell :content (tell :recipient "REA" :content
(script :id 1 :content [
(step :starttime 0.0 :content (eyebrows))
(step :starttime 0.0 :content (rbeat :msg "start"))
(step :starttime 0.2 :content (speak :id 1 :text "Hey baby, what's shaking?"))
(step :event "word_3" :content (rbeat))
])))
This script has four commands in it, executed in order. The first one causes
REA to raise her eyebrows immediately, then start the right hand beat driver.
At 0.2 seconds after the start of the script REA starts to utter "Hey baby,
what's shaking?" (interface with the TTS and lip-sync are taken care of
for you). As REA utters the third word ("what's"), she will execute a beat
gesture with her right arm. A complete list of animator commands can be
found here. The current list of
the keyframes available can be found here,
and the current list of keyframemotions can be found here.
(While it is possible to add to the list of keyframes and keyframemotions,
you probably shouldn't need to do this for the lab.)
To execute a script from boa, open a DOS window and type:
java AMRun polong 5678 <scriptfilename>
Where, <scriptfilename> is the name of the text file containing the
animator command you want executed.
For assigning gestures to the sample data, follow the exact same procedure
as described in Option 1 of this assignment, using the same method, for
locating where gestures are likely to occur, as was described for locating
accents. However, you will need to experiment with the type of gesture
you produce.
Option 3: Facial Display
If you have access to facial animation software, you can try producing
appropriate facial display to accompany the four sentences above. You should
follow the observations in (Chovil, 1991) to see how well these work for
production.
Questions
How well did the given/new and theme/rheme distinctions predict the
placements of accents or gestures? Is it enough to analyze only at the
level of discourse entities? What other phenomena seem to contribute to
the appropriate intonational or gestural pattern? For intonation, what
are the similarities between the Kennedy and Lincoln quotes? What sort
of model might predict the patterns used by Kennedy and Lincoln? In what
ways might the monologic discourse in these examples differ intonationally
from dialogue? For your gesture examples, what rules did you use
to decide on type of gesture? What sort of difficulties might you encounter
if you tried to write an algorithm to automatically annotate text with
the appropriate intonational or gestural markings?
What to turn in
As well as turning in the discussion on the questions above, you should
also turn in the files you used to generate the output, both hardcopy and
the files themselves (as email attachments).
|