Research Group Projects and Descriptions

Music, Mind and Machine Music, Mind and Machine
Principal Investigator: Barry Vercoe

The Music, Mind and Machine group is working towards bridging the gap between the current generation of audio technologies and those that will be needed for future interactive media applications.

goto web site

Classification of Killer Whale Sounds with GMM and HMM Judith Brown

The automatic classification of marine mammal sounds is very attractive as a means of assessing massive quantities of recorded data, freeing humans and offering rigorous and consistent output. Calculations on a set of vocalizations of Northern Resident killer whales using dynamic time warping have been reported recently. Since this method requires the time-consuming pre-processing measurement of frequency contours, we have explored the use of Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM). These methods can be applied directly to time-frequency decompositions of the recorded signals. Calculations have been made on a set of 75 calls previously classified perceptually into 7 call types. Preliminary results give an agreement of roughly 85% with the perceptual classification for GMM and over 90% for HMM.

Alumni Contributor(s): Paris Smaragdis

Cross-Cultural Melodic Transformation Barry Vercoe and Cheng Zhi Huang

It takes us years to learn our own musical tradition. It is therefore rare to find people who attempt to become musically multilingual. However, by learning to compose in different cultural styles, we can expand our compositional palette and communicate more effectively across cultural boundaries. We are designing a computer-assisted compositional tool that can assist composers to begin composing melodies in other cultural styles by dynamically analyzing the musical context and presenting melodic materials from various cultures as musical analogies. These melodic patterns address questions such as: In a target cultural style, how does one develop a musical idea? What are the idiomatic melodic progressions? How does one establish a structural pitch? What are the possible continuations to an unfinished melody and how does one cadence? This tool makes it more accessible for composers to transform and render their musical ideas in other musical languages.

Interactive Graphic Control of Networked Audio Barry Vercoe and Jeremy Flores

The advent of Python-based networks creates new opportunities for mobile interaction. A compelling application would be interactive graphic control of audio on one device by deft gestures on a remote device. We have developed a stand-alone frequency modulation synthesizer using a Csound audio engine, Python, along with PyGTK and Cairo graphic control, and will extend this to other audio processing algorithms. This development, currently in a Linux environment, can be ported to networked platforms like mobile phones, handheld devices, and the $100 laptop.

Musicpainter Barry Vercoe and Wu-Hsi Li

This project focuses on exploring how the design of a composing environment can encourage sharing and collaboration between learners through the composing process. They should be able to share not only completed music compositions, but also tiny catchy tunes or even musical ideas to others. We propose Musicpainter, a graphical composing environment which shapes the music composition fragments into playful bricks. Users create and share the bricks, and piece together compositions by playing with and combining them. First-time composers will not need to compose every note from scratch, since they have a resourceful composing environment to begin with. And by “hacking” music compositions, they will learn how composers put musical ideas more deeply into their works.

Real-Time Network Music Performance Barry Vercoe and Mihir Sarkar

Real-time interactions over the Internet or ad hoc computer networks have plenty of applications ranging from collaborative design to tele-medicine to live entertainment. Online music performances are an especially good case study because of the tight synchronization and timing constraints of such highly rhythmic music as Indian tabla duos. While existing systems transmit compressed audio streams to minimize the effects of network latency, our system sends hierarchical symbolic structures that map to musical intention rather than to individual musical events like notes and timbre. The symbols shape predicted rhythmic phrases in a musically meaningful way preserving the performers' original intent.

goto web site

Social Landscape on Photos and Music Barry Vercoe and Wu-Hsi Li

Social tagging on multimedia content not only provides a mapping between various media and textual descriptive space, but also presents a collection of diversified viewpoints which, in some ways, reflect who the viewers/taggers are. But how can we visualize such subjectivity in a personal/social landscape that conveys information through the spatial relations between photos, music, and texts? In this project, tagging information is collected in part from Flickr.com, and a tagcloud is extracted to represent each photo, concept, and music piece. The personal/social landscape is designed and the meanings of spacing, scaling, zooming, and transforming in visual and auditory spaces are examined. When personal landscapes effectively reflect who they are, we can create virtual social relations based on relating similar landscapes.

Sound Design with Everyday Words Barry Vercoe, Mihir Sarkar and Yang Yang

Musicians often describe the quality of musical sounds with words such as "bright" or "warm". Our project investigates the relationship between auditory perception and language in this context: we are interested in finding whether people use a common terminology to describe timbre or if their choice of words is linked to their musical or cultural background. We deployed an online survey in which over a 1000 participants were asked to find words to describe the sounds they heard. We are now analyzing whether the words they used correlate with timbral features. Our objective is to design an audio processing engine that can automatically tag sounds in a database for retrieval purposes. It could also modify the database according to descriptive words instead of technical parameters.

goto web site

Time-Critical Networks for Interaction Design Barry Vercoe, David P. Reed, Mitchel Resnick and John Maloney

Computer games and learning environments increasingly involve humans relating to computational avatars and robots. Intelligence-modeling software requires real-time interaction between the parts, with channel capacities that must match the best of human-human performance. Many multi-modal activities challenge the real-time communication and comprehension speeds between participants. This project aims to enhance human-machine and machine-machine communication capacities between entities in order to encourage new models of interaction.



MIT Media Laboratory Home Page | Research Main Index