Events Graphic
MIT Media Lab about us . academics . sponsors . research . publications . events . people . contact us
 

Dissertation Defense

WHAT:
Brian Whitman:
"Learning the Meaning of Music"

WHEN:
Thursday, April 14, 2005, 10:00 AM EST

WHERE:
Bartos Theatre, MIT Media Lab (E15)

DISSERTATION COMMITTEE:
Barry L. Vercoe
Professor of Media Arts and Sciences
Massachusetts Institute of Technology

Daniel P.W. Ellis
Assistant Professor of Electrical Engineering
Columbia University

Deb K. Roy
Associate Professor of Media Arts and Sciences
Massachusetts Institute of Technology

ABSTRACT:
Expression as complex and personal as music is not adequately represented by the signal alone. For every artist and song there is a significant culture of meaning connecting the perception to interpretation. This thesis aims to computationally model the meaning of music by taking advantage of community usage and description, using the self-selected and natural similarity clusters, opinions, and usage patterns as labels and ground truth to inform on-line and unsupervised 'music acquisition' systems. We present a framework for capturing community metadata from free-text sources, audio representations robust enough to handle event and meaning relationships yet general enough to work across domains of music, and a machine-learning framework for learning the relationship between music signals and reaction iteratively, at a large scale.

Our work is evaluated and applied as semantic basis functions—meaning classifiers that are used to minimize the non-semantic attachment of a perceptual signal. This process improves upon statistical methods of rank reduction as it aims to model a community's reaction to perception. We show increased accuracy of common music-retrieval tasks with audio projected through semantic basis functions. We also evaluate our models in a 'query-by-description' task for music, where we predict description and community interpretation of a held out labeled test set of audio. We conclude by considering the more general case of 'perceptual data mining,' linking any perceptible data (image, video, sound) with community-derived meaning for better understanding accuracy and more natural interfaces. These unbiased meaning-based learning approaches show superior accuracy in music and multimedia intelligence tasks such as similarity, classification and recommendation.


MIT Media Laboratory Home Page | Events Main Index