The Human Speechome Project

We seek to better understand how children learn the meaning of words through analysis of observational recordings of child-caregiver interactions in natural contexts. Currently available corpora greatly under-sample crucial early stages of child development. As a result, our understanding of language acquisition hinges on surprisingly sparse and incomplete data. Motivated by this basic problem, Roy has begun a pilot project in which he is recording his son's development at home by gathering approximately 10 hours of high fidelity audio and video on a daily basis from birth to age three. The resulting corpus, which already contains over 100,000 hours of multi-track recordings, constitutes the most comprehensive record of a child's development made to date. This data provides many new opportunities to understand the fine-grained dynamics of language development.

A principal challenge of the project is to efficiently transcribe and annotate the massive corpus. New software algorithms and human- computer interfaces will be developed that enable a small team of researchers to quickly and accurately code the raw data semi- automatically. Using these software tools, we plan to study and computationally model the early words uttered by the child by tracing back to the contexts in which they were used by adults speaking to him.

For most children, language development is steady, progressive, and to a casual observer effortless. But for some children -- those with developmental delays due to biological or environmental causes -- language is a major developmental hurdle. Understanding the regularities in home environments is essential to understanding mechanisms of language acquisition, causes of delay, and ultimately appropriate intervention procedures. We believe this project will shed new light on fundamental aspects of how child-caregiver social interactions shape language acquisition.

Although there are clear limits to what may be concluded from studying a single child, in the time-honored tradition of longitudinal case studies dating back to Piaget, the findings from this project may guide more extensive follow-on observational and experimental studies. Beyond the Speechome corpus, the development of an effective semi-automated data coding and analysis methodology may enable scientists to leverage high density audio-visual corpora to address numerous open questions in the behavioral sciences.

Privacy Statement

Audio and video recording of children in their homes is a widely used method with mature ethical norms that is well established in the field of developmental psychology (e.g., see Our project is distinct due to the unusual sampling density of the recordings. There is no plan to distribute or publish the complete original recordings due to privacy considerations, although we will explore ways to work with other researchers by sharing appropriately coded and selected portions of the full corpus.

The birth of a word, 2011 TED talk by Professor Roy.

H2.0 presentation on the Speechome project by Professor Roy.

Sample video image from the kitchen. [JPG, 101K]

Timelapse video of a day of life at home. [QuickTime, 3.5 megs]

Evolution of "water" over several months. [WAV, 3.5 megs]

Video collage of "ball" over several months. [QuickTime, 1.2 megs]

Video visualization of caregiver and child interaction. [high resolution PNG (3M) | low resolution PNG (145K)]

Dynamic generation of video visualization. [QuickTime, 3.6 megs]

Photo/video credit: MIT Media Lab


