• Login
  • Register

Work for a Member company and need a Member Portal account? Register here with your company email address.

Event

Nikhil Singh Dissertation Defense

Groups

Dissertation Title: Bridging the Gap: Generative Machines and Inventive Minds

Abstract: 

Recording technologies, from the phonograph to digital media, have profoundly reshaped the human experience by enabling the capture and reproduction of our sensory world. These technologies allow us to relive experiences through artifacts of remarkable fidelity like photographs and videos, extending the reach of our perception and memory. Of course, we didn't stop at the phonograph; we have built a rich ecosystem of tools for creating, sharing, and exploring recorded media that have had transformative effects on cognition and culture.

Recently, a new and powerful class of tools has emerged: generative models. Unlike recorded media, which reproduces external experiences, generative models can translate our ideas directly into artifacts. Here, ideas refer to abstract mental constructs that seed media creation, externally expressed in text prompts, sketches, vocalizations, or other intuitive representations. Just as recorded media augmented our ability to perceive and remember, generative media promises to expand our ability to imagine and invent by offering a more immediate path from cognition to high fidelity creation. Creative work often has us operating at our limits, negotiating boundaries between knowledge and novelty, skill and aspiration, from individual exploration to collective understanding. Generative models, in principle, have the potential to scaffold and accelerate how we transcend these limits by increasing the efficiency with which we discover and pursue new ideas.

In this thesis, I suggest that realizing this potential presents a complex set of challenges that span computation and design. I argue that it requires us to develop a rich stack of precision tools for human-AI co-creation, as we have done and continue to do for recorded media. Specifically, I present contributions across two key dimensions of this:
1. Computational machinery that supports creative work. I present research on topics including visually-driven acoustic simulation, interpretable and controllable sound generation from descriptions, and audiovisual content understanding. Focusing on sound as a case study, I describe systems that effectively represent and manipulate creative knowledge across modalities and levels of abstraction.
2. Interactive systems and studies that investigate the integration of human and machine effort in content creation. This includes work on conceptual integration in AI-assisted story writing, author-in-the-loop description authoring for accessibility of complex scientific figures, and generative constraints for human ideation. In all, this work seeks insights for designing systems that support human creators through exploration, collaboration, and feedback, rather than aiming to replace or constrain human agency and expertise.

To conclude this thesis, I present a discussion on bridging AI and HCI to gain insights into human creative work and develop stable, generalizable design knowledge for augmenting it. I argue for the design of flexible, parametric tools that enable systematic study of creative behavior under different augmentation designs. Based on this, I propose a conceptual framework to seed the development of a more robust science of human-AI co-creation.


Committee members: 

Tod Machover, Muriel R. Cooper Professor of Music and Media, Massachusetts Institute of Technology
Elena Glassman, Assistant Professor of Computer Science, Harvard University
Ramesh Raskar, Associate Professor of Media Arts and Sciences, Massachusetts Institute of Technology
Pattie Maes, Germeshausen Professor of Media Arts and Sciences, Massachusetts Institute of Technology

More Events