Macro Connections
Transforming data into knowledge.

The way we act, both individually and collectively, depends strongly on the way we see the world. The Macro Connections group focuses on the development of analytical tools that can help improve our understanding of the world's macro structures in all of their complexity. By developing methods to analyze and represent networks—such as the networks connecting countries to the products they export, or historical characters to their peers—Macro Connections research aims to help improve our understanding of the world by putting together the pieces that our scientific disciplines have helped to pull apart.

Research Projects

  • Data Visualization: The Pixel Factory

    Cesar A. Hidalgo and Macro Connections group

    The rise of computational methods has generated a new natural resource: data. While it's unclear if big data will open up trillion-dollar markets, it is clear that making sense of data isn't easy, and that data visualizations are essential to squeeze meaning out of data. But the capacity to create data visualizations is not widespread; to help develop it we introduce the Pixel Factory, a new initiative focusing on the creation of data visualization resources and tools in collaboration with corporate members. Our two goals are to create software resources that facilitate development of online data-visualization platforms that can work with any type of data; and, to create these resources as a means to learn. The most valuable outcome of this work will not be the software resources produced—incredible as these could be—but the generation of people with the capacity to make these resources.

  • DIVE

    Cesar A. Hidalgo, Manuel Aristaran and Kevin Zeng Hu

    The Data Integration and Visualization Engine (DIVE) is a platform for semi-automatically generating web-based, interactive visualizations of structured data sets. DIVE will allow users to quickly and efficiently create visualization engines like the Observatory of Economic Complexity, DataViva, and Pantheon. Three components lie at the core of DIVE: inferring the properties and models underlying arbitrary datasets, mapping these properties to visualizations, and programmatically creating scalable, customizable websites integrating these visualizations.

  • FOLD

    Alexis Hope, Kevin Hu

    Imagine reading about the 2008 housing crisis without knowing what a mortgage is. Jumping into complex news stories is difficult, particularly stories requiring historical or technical context. We hypothesize that the feeling of frustration and inadequacy that comes with not being able to understand the news causes readers to turn away from specific pieces or entire stories. FOLD is an authoring and publishing platform allowing storytellers to structure and contextualize their stories to make their work more accessible. Authors can provide “curated tangents” to readers by integrating contextual information from online sources or by reusing other authors’ context blocks. Readers can progress through a story vertically to read the narrative, and side-to-side to access these context blocks. We believe that FOLD can help readers of all ages and backgrounds confidently engage with complex stories.

  • GIFGIF

    Cesar A. Hidalgo, Andrew Lippman, Kevin Zeng Hu and Travis Rich

    An animated gif is a magical thing. It contains the power to compactly convey emotion, empathy, and context in a subtle way that text or emoticons often miss. GIFGIF is a project to combine that magic with quantitative methods. Our goal is to create a tool that lets people explore the world of gifs by the emotions they evoke, rather than by manually entered tags. A web site with 200,000 users maps the GIFs to an emotion space and let's you peruse them interactively.

  • Immersion

    Deepak Jagdish, Daniel Smilkov and Cesar Hidalgo

    Immersion is a visual data experiment that delivers a fresh perspective of your email inbox. Focusing on a people-centric approach rather than the content of the emails, Immersion brings into view an important personal insight–the network of people you are connected to via email, and how it evolves over the course of many years. Given that this experiment deals with data that is extremely private, it is worthwhile to note that when given secure access to your Gmail inbox (which you can revoke any time), Immersion only uses data from email headers and not a single word of any email's subject or body content.

  • Opus

    Cesar A. Hidalgo and Miguel Guevara

    Opus is an online tool exploring the work and trajectory of scholars. Through a suite of interactive visualizations, Opus help users explore the academic impact of a scholar's publications, discover her network of collaborators, and identify her peers.

  • Pantheon

    Ali Almossawi, Andrew Mao, Defne Gurel, Cesar A. Hidalgo, Kevin Zeng Hu, Deepak Jagdish, Amy Yu, Shahar Ronen and Tiffany Lu

    We were not born with the ability to fly, cure disease, or communicate at long distances, but we were born in a society that endows us with these capacities. These capacities are the result of information that has been generated by humans and that humans have been able to embed in tangible and digital objects. This information is all around us: it's the way in which the atoms in an airplane are arranged or the way in which our cellphones whisper dance instructions to electromagnetic waves. Pantheon is a project celebrating the cultural information that endows our species with these fantastic capacities. To celebrate our global cultural heritage, we are compiling, analyzing, and visualizing datasets that can help us understand the process of global cultural development.

  • Place Pulse

    Phil Salesses, Anthony DeVincenzi, and César A. Hidalgo

    Place Pulse is a website that allows anybody to quickly run a crowdsourced study and interactively visualize the results. It works by taking a complex question, such as “Which place in Boston looks the safest?” and breaking it down into easier-to-answer binary pairs. Internet participants are given two images and asked "Which place looks safer?" From the responses, directed graphs are generated and can be mined, allowing the experimenter to identify interesting patterns in the data and form new hypothesis based on their observations. It works with any city or question and is highly scalable. With an increased understanding of human perception, it should be possible for calculated policy decisions to have a disproportionate impact on public opinion.

  • StreetScore

    Nikhil Naik, Jade Philipoom, Ramesh Raskar, Cesar Hidalgo

    StreetScore is a machine learning algorithm that predicts the perceived safety of a streetscape. StreetScore was trained using 2,920 images of streetscapes from New York and Boston and their rankings for perceived safety obtained from a crowdsourced survey. To predict an image's score, StreetScore decomposes this image into features and assigns the image a score based on the associations between features and scores learned from the training dataset. We use StreetScore to create a collection of map visualizations of perceived safety of street views from cities in the United States. StreetScore allows us to scale up the evaluation of streetscapes by several orders of magnitude when compared to a crowdsourced survey. StreetScore can empower research groups working on connecting urban perception with social and economic outcomes by providing high resolution data on urban perception.

  • The Economic Complexity Observatory

    Alex Simoes and César A. Hidalgo

    With more than six billion people and 15 billion products, the world economy is anything but simple. The Economic Complexity Observatory is an online tool that helps people explore this complexity by providing tools that can allow decision makers to understand the connections that exist between countries and the myriad of products they produce and/or export. The Economic Complexity Observatory puts at everyone’s fingertips the latest analytical tools developed to visualize and quantify the productive structure of countries and their evolution.

  • The Language Group Network

    Shahar Ronen, Kevin Hu, Michael Xu, and César A. Hidalgo

    Most interactions between cultures require overcoming a language barrier, which is why multilingual speakers play an important role in facilitating such interactions. In addition, certain languages–not necessarily the most spoken ones–are more likely than others to serve as intermediary languages. We present the Language Group Network, a new approach for studying global networks using data generated by tens of millions of speakers from all over the world: a billion tweets, Wikipedia edits in all languages, and translations of two million printed books. Our network spans over eighty languages, and can be used to identify the most connected languages and the potential paths through which information diffuses from one culture to another. Applications include promotion of cultural interactions, prediction of trends, and marketing.

  • The Network Impact in Success

    Cesar A. Hidalgo and Miguel Guevara

    Diverse teams of authors are known to generate higher-impact research papers, as measured by their number of citations. But is this because cognitively diverse teams produce higher quality work, which is more likely to get cited and adopted? Or is it because they possess a larger number of social connections through which to distribute their findings? In this project we are mapping the co-authorship networks and the academic diversity of the authors in a large volume of scientific publications to test whether the adoption of papers is explained by cognitive diversity or the size of the network associated with each of these authors. This project will help us understand whether the larger levels of adoption of work generated by diverse groups is the result of higher quality, or better connections.

  • The Privacy Bounds of Human Mobility

    Cesar A. Hidalgo and Yves-Alexandre DeMontjoye

    We used 15 months of data from 1.5 million people to show that four points–approximate places and times–are enough to identify 95 percent of individuals in a mobility database. Our work shows that human behavior puts fundamental natural constraints on the privacy of individuals, and these constraints hold even when the resolution of the dataset is low. These results demonstrate that even coarse datasets provide little anonymity. We further developed a formula to estimate the uniqueness of human mobility traces. These findings have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.