Project

Liquid News

Liquid News aims to help people better understand and interact with the news by providing machine-learning-based analysis and semantic navigational aids. This will allow users to parse news via a semantic-relational model that leverages the latent connection between news segments. The hope is that this will uncover the relationships between topics covered across multiple news sources and promote a greater understanding of the news and media around us.

Background

Over decades, news and how we interact with information have significantly evolved. This evolution has led to greater accessibility and connectivity with the rise of the internet. Still, it also has strengthened or introduced negative factors such as bias and fake news. This has culminated in a world where the ability to access and share information has become readily available; however, truly understanding the news and global events has become far more obscured due to the rampant rise of bias and fake news.

The intended end product of Liquid Movies News is an interface that allows users to parse news via a semantic-relational model that leverages the latent … View full description

Liquid News aims to help people better understand and interact with the news by providing machine-learning-based analysis and semantic navigational aids. This will allow users to parse news via a semantic-relational model that leverages the latent connection between news segments. The hope is that this will uncover the relationships between topics covered across multiple news sources and promote a greater understanding of the news and media around us.

Background

Over decades, news and how we interact with information have significantly evolved. This evolution has led to greater accessibility and connectivity with the rise of the internet. Still, it also has strengthened or introduced negative factors such as bias and fake news. This has culminated in a world where the ability to access and share information has become readily available; however, truly understanding the news and global events has become far more obscured due to the rampant rise of bias and fake news.

The intended end product of Liquid Movies News is an interface that allows users to parse news via a semantic-relational model that leverages the latent connection between news segments to garner a better understanding of the news at hand. For example, Queen Elizabeth II's death was heavily covered in the news/media. The information was focused not only on her death but on a whole category of related topics such as royal success, British history, the monarchy's wealth, etc. These topics relate to the Queen's death on a latent semantic-relational level and are essential to understanding her death's significance. However, these topics were covered across multiple news mediums and at varying depths, making it hard to identify and understand these latent connections. Liquid Movies aims to build an interface that uses machine learning to identify the key topics and parse, group, and relate news segments from many news sources, hopefully uncovering these latent relationships and promoting a greater understanding of the news and media around us. 

System Overview

Transcription

The first stage of the Liquid News pipeline is creating accurate transcriptions for news to compute downstream NLP tasks. First, we extract the audio components from a given news cycle, run it through the recently released Whisper model, and then create a corresponding SRT formatted captioning for the given audio file. 

Tokenization and Embedding

For each transcription, the next stage of the pipeline determines how the transcription should be tokenized. The token size is a hyperparameter of the model. We will learn the optimal token size via conventional ML models. Once tokenized, each token is embedded using a large language model such as GPT-3 or RoBERTa. 

Clustering & Topic Extraction

Once the tokens have embeddings, the system uses the k-means clustering technique to identify related clusters. The system will then concatenate the respective text for each token into a cluster summary for each cluster. These cluster summaries are passed back to GPT-3 to generate an associated topic.

Video Segmentation

Given the clustering and topic extraction module's output, the system will backtrack using the hash map generated throughout the pipeline. An approximate model of the hashtable representing the video(s) information is shown below.

Run Liquid News Pipeline

Version 1.0 of the Liquid News backend has been made available on  GitHub. Instructions on how to run the pipeline are present in the README. The entire process has been streamlined to run on a Google Colab file.