Brian Whitman, Deb Roy, Barry Vercoe
Work for a Member company and need a Member Portal account? Register here with your company email address.
Jan. 1, 2003
Brian Whitman, Deb Roy, Barry Vercoe
The audio bitstream in music encodes a high amount of statistical, acoustic, emotional and cultural information. But music also has an important linguistic accessory; most musical artists are described in great detail in record reviews, fan sites and news items. We highlight current and ongoing research into extracting relevant features from audio and simultaneously learning language features linked to the music. We show results in a “query-bydescription” task in which we learn the perceptual meaning of automatically-discovered single-term descriptive components, as well as a method of automatically uncovering ‘semantically attached’ terms (terms that have perceptual grounding.) We then show recent work in ‘semantic basis functions’ – parameter spaces of description (such as fast ... slow or male ... female) that encode the highest descriptive variance in a semantic space.