Project

The Spread of True and False Information Online

Laboratory for Social Machines

Project Contact:

Frequently Asked Questions

What’s the background on this research?
What was the methodology you used?
What was the main finding of the research?
Did anything surprise you in the research?
What does this research tell us about human behavior?
How is this different from “fake news”?
How do you determine which tweets contain real news and which are false?
Did you remove bots from your analysis?
Did bots drive these results?
How does this connect to AI?
What does this tell us about Twitter and other social media platforms?
What does Twitter think about this?
How will Twitter leverage your research, if at all?
What is LSM’s relationship with Twitter?
Could Facebook or other social media platforms use this too?
You use the Twitter Firehose to get this data. What is the “Firehose,” how is it different than the public view of Twitter content, and how are you using this data?
What was the rubric for defining an unreliable Twitter source? How do you assess that the content might be false?
Did purposeful manipulation drive these results?

What’s the background on this research?

We wanted to understand differences in how true and false news spread in social media across a broad range of kinds of news. This interest was prompted by the Boston Marathon bombing, at which time both Soroush Vosoughi and Deb Roy personally experienced the impact of the spread of rumors (news which later turned out sometimes to be false) while trying to find the latest updates. Soroush then developed his PhD around a model for detecting and predicting the veracity of rumors as they began to spread -- which would become the basis for wanting to understand and explain the spread of false news online. Sinan Aral’s work, on the other hand, focused on the impact of social media and social influence on the diffusion of information and behavior in online social networks. We collaborated with Sinan to study a wide range of rumors, and to understand patterns of the spread of true vs. false information overall.
What was the methodology you used?

We browsed stories across the websites of six fact-checking news organizations. We identified stories for which there was agreement on their veracity amongst the organizations. We then searched on Twitter for content about these items. We used machine learning to match the text urls, and memes from Twitter to these stories.
What was the main finding of the research?

False information spreads faster, farther, deeper, and more broadly than true information. False information also tends to be more novel than true news. On average, false news is ~70% more likely to be retweeted than true news.
Did anything surprise you in the research?

Bots weren’t as important in the spread of false info as we thought they might be. Also, the strong correlation of emotional responses of surprise and disgust tied to false news.
What does this research tell us about human behavior?

People are more likely to spread novel and surprising information, which favors the spread of falsity over the truth.
How is this different from “fake news”?

This work explores both false and true information and their spread. According to a recent Gallup Poll, Americans refer to three distinct meanings when using the phrase “fake news”:
1. False news presented as truth
2. Opinion presented as fact
3. True news that cast a politician or political party in a negative light
Our study was focused on contested news which often ends up being addressed by fact-checking organizations, and corresponds to (1) and (2) but not type (3) of “fake news.”
How do you determine which tweets contain real news and which are false?

The stories contained in the tweet have already been investigated by some or all of the six independent fact-checking organizations. The veracity of these stories are confirmed and extracted from the organizations’ websites and used as ground truth in our analysis.
Did you remove bots from your analysis?

We used state-of-the-art detectors (developed by other academic labs) to remove bots from our data. Removing the bots did not alter our metrics or the findings of the study. We believe that although bots were present in our data, they were not the driver of the findings. Humans were.
Did bots drive these results?

Bots accelerated the spread of false and true news at approximately the same rate. This suggests that false news spreads farther, faster, deeper and more broadly than the truth because humans, not robots, are more likely to spread it.
How does this connect to AI?

Some AI / machine learning methods were used in the process of analyzing and interpreting the data, but the main relationship to AI is in the analysis of bot activity and the role of bots in the spread of false news.
What does this tell us about Twitter and other social media platforms?

The affordances of Twitter, which let anyone become a broadcaster, combined with the human behavior of millions of users, enable false news to propagate faster than true news. Though our research was done on Twitter, we believe are findings are not specific to Twitter. We think the findings are likely to apply to other Internet-based communication platforms in which users can share news with others.
What does Twitter think about this?

Twitter provided funding and data access to support this research, and permitted us to publish the findings. We have shared these results with Twitter prior to publication.
How will Twitter leverage your research, if at all?

We are not in a position to comment on Twitter’s plans.
What is LSM’s relationship with Twitter?

Twitter supported the Lab for Social Machines (LSM) based at the MIT Media Lab for the past four years through funding and access to Twitter data. As Principal Investigator of LSM, Deb Roy sets the research directions for the lab independently of Twitter.
Could Facebook or other social media platforms use this too?

Since our studies are based on Twitter data, we can only conjecture what might be found if similar studies were repeated on other platforms. We hope that other platforms take Twitter’s lead to support independent research studies. That said, we suspect that the patterns we found are likely to be found on any platform such as Facebook in which individual users may rebroadcast news to others on the network.
You use the Twitter Firehose to get this data. What is the “Firehose,” how is it different than the public view of Twitter content, and how are you using this data?

We used the Twitter historical archives for this study. The archives include all tweets ever made, going back to the first tweet. This is different than the public view of Twitter content as only the most recent ~3,200 tweets of an account are publicly viewable.
What was the rubric for defining an unreliable Twitter source? How do you assess that the content might be false?

We verified the accuracy of the claims through six fact-checking news organizations that exhibited 95-98% agreement on the classifications (snopes.com, politifact.com, factcheck.org, truthorfiction.com, hoax-slayer.com and urbanlegends.about.com).
Did purposeful manipulation drive these results?

The current study cannot address this question.

The Spread of True and False Information Online

Project Contact:

Frequently Asked Questions

What’s the background on this research?

What was the methodology you used?

What was the main finding of the research?

Did anything surprise you in the research?

What does this research tell us about human behavior?

How is this different from “fake news”?

How do you determine which tweets contain real news and which are false?

Did you remove bots from your analysis?

Did bots drive these results?

How does this connect to AI?

What does this tell us about Twitter and other social media platforms?

What does Twitter think about this?

How will Twitter leverage your research, if at all?

What is LSM’s relationship with Twitter?

Could Facebook or other social media platforms use this too?

You use the Twitter Firehose to get this data. What is the “Firehose,” how is it different than the public view of Twitter content, and how are you using this data?

What was the rubric for defining an unreliable Twitter source? How do you assess that the content might be false?

Did purposeful manipulation drive these results?