Project

Spoken Opinion Summarization

Shayne O'Brien

Project Contact:

Talk radio exerts significant influence on the political and social dynamics of the United States, but labor-intensive data collection and curation processes have prevented previous works from studying its content at scale. Over the past year, the Laboratory for Social Machines and Cortico have created a talk radio ingest system to record and automatically transcribe audio from more than 160 stations around the country. Using these transcripts, we propose novel compression-based methods for unsupervised summarization of spoken opinion in conversational dialogue. By relying on an unsupervised framework that obviates the need for labeled data, the summarization task becomes largely agnostic to human input beyond necessary decisions regarding model architecture, input data, and output length. As a result, trained models are able to produce a more accurate depiction of opinion. Using the outputs of my proposed methods, we conduct a case study to examine the variability of public opinion across America. In the interests of reproducibility and further research, we open-source all code and data used.