******* Language, Cognition, and Computation Lecture Series *******

Title Two Questions in Statistical Natural Language Processing

Speaker Michael Collins

This talk will focus on two areas of research in statistical or

machine learning approaches to natural language processing.

In the first part of the talk I'll describe research on statistical

approaches to natural language parsing. I will first review work on

generative, history-based models, and describe how these methods give

a probabilistic model for a number of syntactic phenomena. The talk

will then cover more recent discriminative or non-parametric

approaches to the parsing problem. A key feature of these methods is

their flexibility in terms of the parse tree features which can

be incorporated.

In the second part of the talk I'll discuss research on unsupervised,

or partially supervised, approaches to natural language problems. Much

of the work in statistical NLP has considered supervised training: A

human annotator marks examples (for example part-of-speech tag

sequences, parse tree structures, or named entities in text) which are

then used to train a model that recovers similar structures on test

data. Unfortunately, manual labeling of data can be laborious, and may

simply not be feasible in some domains. I will discuss

statistical models, and experimental results, showing that in some

cases unlabeled examples can drastically reduce the need for

supervised training examples.

Bio:

Michael Collins did his undergraduate studies in Electrical

Engineering at Cambridge University, and went on to do a Masters in

Speech and Language Processing, also at Cambridge. He received his PhD

from University of Pennsylvania in 1998. In his dissertation, Mike worked on

statistical methods for natural language parsing which led to one of the highest

performing parsers in the field. After his PhD, he was a researcher AT&T

labs-research from January 1999 until November 2002. Since January

2003 he has been an Assistant Professor in the EECS department at MIT,

and in the MIT AI Lab. His research interests are in natural language

processing and machine learning.

*******************************************************************