Thesis

The Birth of a Word

Roy, B. "The Birth of a Word"

Abstract

A hallmark of a child's first two years of life is their entry into language, from first productive word use around 12 months of age to the emergence of combinatorial speech in their second year. What is the nature of early language development and how is it shaped by everyday experience?

This work builds from the ground up to study early word learning, characterizing vocabulary growth and its relation to the child's environment. Our study is guided by the idea that the natural activities and social structures of daily life provide helpful learning constraints. We study this through analysis of the largest-ever corpus of one child's everyday experience at home. Through the Human Speechome Project, the home of a family with a young child was outfitted with a custom audio-video recording system, capturing more than 200,000 hours of audio and video of daily life from birth to age three. The annotated subset of this data spans the child's 9-24 month age range and contains more than 8 million words of transcribed speech, constituting a detailed record of both the child's input and linguistic development.

Such a comprehensive, naturalistic dataset presents new research opportunities but also requires new analysis approaches " questions must be operationalized to leverage the full scale of the data. We begin with the task of speech transcription, then identify "word births" " the child's first use of each word in his vocabulary. Vocabulary growth accelerates and then shows a surprising deceleration that coincides with an increase in combinatorial speech. The vocabulary growth timeline provides a means to assess the environmental contributions to word learning, beginning with aspects of caregiver input speech. But language is tied to everyday activity, and we investigate how spatial and activity contexts relate to word learning. Activity contexts, such as "mealtime", are identified manually and with probabilistic methods that can scale to large datasets. These new nonlinguistic variables are predictive of when words are learned and are complementary to more traditionally studied linguistic measures. Characterizing word learning and assessing natural input variables can lead to new insights on fundamental learning mechanisms.

Related Content