Publication

Creative Text-to-Audio Generation via Synthesizer Programming

Singh*, N., Cherep*, M., & Shand, J. (2023, December). Creative Text-to-Audio Generation via Synthesizer Programming. In NeurIPS Machine Learning for Audio Workshop

Abstract

Sound designers have long harnessed the power of abstraction to distill and highlight the semantic essence of real-world auditory phenomena, akin to how simple sketches can vividly convey visual concepts. However, current neural audio synthesis methods lean heavily towards capturing acoustic realism. We introduce an open-source novel method centered on meaningful abstraction. Our approach takes a text prompt and iteratively refines the parameters of a virtual modular synthesizer to produce sounds with high semantic alignment, as predicted by a pretrained audio-language model. Our results underscore the distinctiveness of our method compared with both real recordings and state-of-the-art generative models. 

Related Content