Spencer Russell Dissertation Defense

April 24, 2020
4:00pm — 6:00pm

Dissertation Title: Resynthesizing Volumetric Soundscapes: Low-rank Subspace Methods for Soundfield Estimation and Reconstruction

Participation link (pw: soundscape)

Sound and space are fundamentally intertwined, at both a physical and perceptual level. Sound radiates from vibrating materials, filling space and creating a continuous field through which a listener moves. Despite a long history of research in spatial audio, the technology to capture these sounds in space is currently limited. Egocentric (binaural or ambisonic) recording can capture sound from all directions, but only from a limited perspective. Recording individual sources and ambience is labor-intensive, and requires manual intervention and explicit localization.

In this work I propose and implement a new approach, where a distributed collection of microphones captures sound and space together, resynthesizing them for a (now-virtual) listener in a rich volumetric soundscape to explore. This approach offers great flexibility to design new auditory experiences, as well as giving a much more semantically-meaningful description of the space. The research is situated at the Tidmarsh Wildlife Sanctuary, a 600-acre former cranberry farm that underwent the largest-ever freshwater restoration in the northeast. It has been instrumented with a large-scale (300x300m²) distributed array of 12-18 microphones which has been operating (almost) continuously for several years.

This dissertation details methods for characterizing acoustic propagation in a challenging high-noise environment, and introduces a new method for correcting for clock skew between unsynchronized transmitters and receivers. It also describes a localization method capable of locating sound-producing wildlife within the monitored area, with experiments validating the accuracy to within 5m.

The scale of the array provides an opportunity to investigate classical array processing techniques in a new context, with nonstationary signals and long interchannel delays. We propose and validate a method for location-informed signal enhancement using a rank-1 spatial covariance matrix approximation, achieving 11dB SDR improvements with no source signal modeling.

These components are brought together in an end-to-end demonstration system that resynthesizes a virtual soundscape from multichannel signals recorded in situ, allowing users to explore the space virtually. Positive feedback is reported in a user survey.

Committee members:

Joseph A. Paradiso, Alexander W. Dreyfoos (1954) Professor of Media Arts and Sciences, MIT Media LabDaniel P. W. Ellis, Research Scientist, Google, Inc.
Josh McDermott, Associate Professor, MIT Department of Brain and Cognitive Sciences

More Events