StoryLine, a collaboration between McKinsey & Company’s Consumer Tech and Media team and the Lab for Social Machines (LSM), explores the intersection of visual storytelling and machine learning through work aimed at helping storytellers understand and improve the impact of their stories on their audiences.
StoryLine was inspired by LSM's groundbreaking Electome project, in which researchers used advanced machine learning to build network maps of engaged election audiences and then track the diffusion of relevant conversation and content through these networks. For Electome, the result was a powerful new way of understanding how audiences form around specific political, social and cultural ideas. For StoryLine, we wondered whether this approach could translate meaningfully into measuring impact in the storytelling domain, which outside of marketing optimization has been underserved in any practical way by recent advances in machine learning and artificial intelligence.
Understanding audience impact has arguably never been more important to those involved in the creation, production, distribution and marketing of visual stories—from the largest studios, networks and platforms to the independent creators who publish and promote on their own. More videos are produced by more creators, distributed on more channels and consumed on more devices than ever before. Some of these stories find an audience. Many don’t. And the reason one story hits and another misses often remains a mystery to creators and distributors alike.
What if there was a way to understand, and even predict, the relationship between video story content and audience consumption?
This is the question driving StoryLine researchers and media experts to collaborate on automated methods for linking the structure of a video story (visuals, soundtrack, script, etc.) to how people engage with it (watch, like, comment, share, etc.).
Linking content structure to audience consumption of video automatically and at scale is a major innovation in machine learning. Natural language text and human speech are now increasingly "understood" by machines. Teaching a machine to understand an untagged video story—e.g., structural components like overall storyline, character development, emotional arc—is a next-generation problem that demands integration of computer vision, semantic and audio analysis and neural network models. Taking the further step to link content structure to audience engagement across hundreds/thousands/millions of stories requires advanced statistical modeling and high-powered network “diffusion” analysis (plus visualizations that help us make sense of it all).