Post

Trash to Treasure: Using text-to-image models to inform the design of physical artefacts

Copyright

Courtesy of the Researchers

Courtesy of the Researchers

Does using text-to-image models like Stable Diffusion in the creative process affect what people make in the physical world?

Last fall, Media Lab PhD students Hope Schroeder and Ziv Epstein ran a workshop with Amy Smith, a visiting student in the Viral Communications research group, as part of AI Alchemy Lab. At a trash-themed social event, 30 participants opted into a research study where they visualized ideas for a sculpture using a generative text-to-image model. This pilot study sought to understand if using a text-to-image model as part of the ideation process informs the design of objects in the physical world. The paper describing the findings from this pilot study is being presented today, 2/13, in Washington DC at AAAI as part of the first Creative AI Across Modalities workshop. 

The study found that seeing AI-generated images before making a sculpture did inform what people created; 23/30 participants reported that the images they generated informed their design. Here, generated images informed a sculpture of a building.

Copyright

Courtesy of the Researchers

Here, a participant created a bottle robot sculpture after seeing some generated images:

Copyright

Courtesy of the Researchers

We noticed that participants varied in the amount of conceptual exploration they did through “prompting” the model, with some participants using the images to explore ideas and others using the images to refine existing ones.

“Refiners” made minor edits to a main idea through prompting:​​

Copyright

Courtesy of the Researchers

Rephrasers” had a main concept but reworded it between prompts:

Copyright

Courtesy of the Researchers

“Explorers” gave largely unrelated prompts, showing a high degree of conceptual exploration across prompts:                                      

Copyright

Courtesy of the Researchers

We created a computational measure of conceptual exploration over a participant’s prompting journey by taking the average cosine distance between prompt embeddings. The image below shows an example of each of the three styles that emerged:

Copyright

Courtesy of the Researchers

The average semantic distance a participant traveled in their prompting journey during the visualization activity was lower if participants had a sculpture idea at the start of the activity than if they did not. This suggests participants who started visualization with ideas already used image generation as an opportunity to “exploit” or refine ideas, traveling less average semantic distance than those who were unsure what to build and used the images to explore. To better support creators, text-to-image tools could identify a user’s semantic distance traveled over a prompting session to suggest hints that are useful to their current design stage.

Participants in the activity had some great Media Lab fun in the process, using play to interrogate new technologies and gain scientific insight in the process.

This effort was a collaboration between Amy Smith (QMUL/IGGI), Hope Schroeder, Ziv Epstein, and Andy Lippman at the Media Lab, Mike Cook at King’s College London, and Simon Colton at QMUL.

Copyright

Courtesy of the Researchers

Related Content