Embedding Synthesis

Bernd Schoner
schoner@media.mit.edu
http://www.media.mit.edu/~schoner/

Introduction

Much effort has been put into modeling musical instruments and their audio output. However, neither of the two more conventional modeling approaches has succeeded in satisfyingly synthesizing an original instrument. We believe there are two main reasons for these failures:

a) The linear approximations usually used are unsuitable to model the highly non-linear behaviour of musical instruments.

b) Artificial interfaces (keyboards) are unsuitable replacements for the interface devices of the original instrument. For example, a keyboard can not provide the subtle control possibilities of a violin bow.

Given these assumptions, we propose a new concept of musical synthesis: We consider the violin as a physical system receiving a set of input data and producing a set of output data. Input (like bow position) and output (audio signal) are given as time series. To build our model of the violin, we first simultaneously record the relevant input data, captured by sensors, and the audio signal. From this data, we try to infer a nonlinear functional match between the input data and the audio output. This function is exclusively based on the collected data. It describes the global physical behaviour of the violin without analyzing the physical mechanisms themselves. Having done this training, we feed the computer with the same input data as the original instrument and use the model for synthesis.

The Betts Strad

Embedding and Musical Synthesis

Our Synthesis Model is based on the embedding theorem first formulated by Takens: The state space representation of the internal states of any physical system can be isomorphically mapped to a space constructed over one observable of the system and its time lags. We can then determine the behaviour of such a system given enough subsequent samples from a single time series being produced by the system.

Given a driven system (a musical instrument) as opposed to an autonomous system, we use an input/output embedding space. In addition to the time lags of the audio system, we use the input time series and their time lags as separate space dimensions. To capture all possible states of our system, we use twice the number of internal degrees of freedom of the violin as number of time lags for all the time series used. However, as the violin is a driven dissipative system, we hope to decrease the space dimension to a reasonable computational size.

State Reconstruction

To approximate our prediction function we are exploring two different approaches. The first is a functional fit consisting of a polynomial estimate with a regularizing term to avoid overfitting. Although performing well in terms of extrapolating from the given data set and modeling the wave-envelope, this approach causes serious stability problems when the system predicts iteratively.

The second approach is a probabilistic one of state space reconstruction based on clusters. Assuming that each of our data points represents a state of the system with gaussian noise, we assign clusters to the states. Our probability estimate of each state is then a weighted sum of gaussians built over the cluster centers. To find the cluster parameters, we tried different algorithms such as Clustering by Melting or the Expectation/Maximisation-Algorithm. For the actual prediction, we use the maximum likelihood of the conditional probability of the output dimension given all the time lags of the output and all the input dimensions. This approach seems stable, but it does not extrapolate well.

Follow... Follow...

Figure 1: Violin Orbit(Attractor) (audio signal plus two time lags of the audio signal)

Figure 2: Violin Orbit (audio signal,time lag of the audio signal,bow position)

The Final Goal

Our end goal is to synthesize the violin sound in performance given the online input data of a player. The Embedding Synthesis Project, also called 'The Digital Stradivarius' Project, was begun in collaboration with the Kronos Quartet for a piece written by Tod Machover which will premiere in Singapore in 1997. By then, the four musicians will be playing 'on the same instrument'. Each player will feed input data into the computer, and from this the violin sound will be created. In addition to musical applications, there are many other domains where non-linear function approximation would be very helpful. We hope to apply our experience to these other areas as well.

Have a look at the html-version of my entire diploma thesis or download the postscrip-file.

A Long Pattern Strad

References

1. Casdagli Martin 1992. A Dynamical Systems Approach to Modeling Input-Output Systems.

2. Rose Kenneth, Eitan Gurewitz, Geoffrey Fox 1990. A deterministic approach to clustering. Pattern Recognition Letters 11 (1990) 589-594.

3. Picard W.Picard, Popat Kris. Media Lab 1993. Novel cluster-based probability model for texture synthesis, classification, and compression. Proc. SPIE Visual Communications and Image Processing '93. Boston 1993.

4.Joseph A. Paradiso and Neil Gershenfeld. Media Lab 1995. Musical Applications of Electric Field Sensing.

5. Wong Yiu-fai 1993. Clustering Data by Melting.

6. Wong Yiu-fai 1993. A New Clustering Algorithm Applicable to Multispectral and Polarimetric SAR Images. IEE Transactions on Geoscience and Remote Sensing, Vol.31, No.3, 1993.

7.Girosi, Jones, Poggio.Regularization Theory and Neural Networks Architectures. Neural Computation 7,1995.