Taito’s Karaoke-on-Demand Machine
Taito’s Karaoke-on-Demand Machine A revolution in music technology has people singing a new tune


Hidehito Kitamura Taito Corporation
Barry Vercoe Head, Music, Mind and Machine group MIT Media Lab


MIT Media Lab, Cambridge, MA; and NTT Comware Corporation, Toyko, Japan


Taito wanted to bring the flexibility of software-only audio processing into an industry dominated by application-specific devices, and adopt technologies that would be continually flexible and scalable.


The system was developed with two Media Lab sponsors at the time of collaboration: Analog Devices, who worked with Vercoe to develop Extended Csound on a high-speed, real-time DSP (digital signal processor) chip, and Denon Electronics, who provided the chip to Taito.


Some 17 years after Vercoe developed the Csound music synthesis language, Csound was finally incorporated into its first commercial application: Lavca, a “karaoke” music player, in the summer of 2002, by Taito Corporation, a Japanese media integrator. The player, which receives high-quality, instrumental music data and scores via broadband, automatically adjusts pitch and tempo, with sound quality equal to a CD, thanks to its onboard Extended Csound technology. “What makes this system unique,” says Vercoe, “is that it is the first commercial audio product to be based entirely on software synthesis instead of the usual hardware. This makes it flexible and upgradable, with new versions easily installed in a software download. Moreover, the music ‘follows’ the performer, speeding up or slowing down to match the singer’s performance.” In addition, Lavca can correct the singer’s wrong notes with the right ones. So why was Csound late to commercial product integration? “Because,” says Vercoe, “the computational load of synthesizing 64 voices plus effects, along with MPEG decoding and pitch and tempo variation could only become economically feasible, when the latest generation of cheap, yet high-speed DSPs became available. We have finally crossed the threshold into both flexible and high-quality audio processing.”