By Joi Ito
When we look at a stack of blocks or a stack of Oreos, we intuitively have a sense of how stable it is, whether it might fall over, and in what direction it may fall. That’s a fairly sophisticated calculation involving the mass, texture, size, shape, and orientation of the objects in the stack.
Researchers at MIT led by Josh Tenenbaum hypothesize that our brains have what you might call an intuitive physics engine: The information that we are able to gather through our senses is imprecise and noisy, but we nonetheless make an inference about what we think will probably happen, so we can get out of the way or rush to keep a bag of rice from falling over or cover our ears. Such a “noisy Newtonian” system involves probabilistic understandings and can fail. Consider this image of rocks stacked in precarious formations.
Based on most of your experience, your brain tells you that it's not possible for them to remain standing. Yet there they are. (This is very similar to the physics engines inside videogames like Grand Theft Auto that simulate a player’s interactions with objects in their 3-D worlds.)
For decades, artificial intelligence with common sense has been one of the most difficult research challenges in the field—artificial intelligence that “understands” the function of things in the real world and the relationship between them and is thus able to infer intent, causality, and meaning. AI has made astonishing advances over the years, but the bulk of AI currently deployed is based on statistical machine learning that takes tons of training data, such as images on Google, to build a statistical model. The data are tagged by humans with labels such as “cat” or “dog”, and a machine’s neural network is exposed to all of the images until it is able to guess what the image is as accurately as a human being.
One of the things that such statistical models lack is any understanding of what the objects are—for example that dogs are animals or that they sometimes chase cars. For this reason, these systems require huge amounts of data to build accurate models, because they are doing something more akin to pattern recognition than understanding what’s going on in an image. It’s a brute force approach to “learning” that has become feasible with the faster computers and vast datasets that are now available.
It’s also quite different from how children learn. Tenenbaum often shows a video by Felix Warneken, Frances Chen, and Michael Tomasello, of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, of a small child watching an adult walk repeatedly into a closet door, clearly wanting to get inside but failing to open it properly. After just a few attempts, the child pulls the door open, allowing the adult to walk through. What seems cute but obvious for humans to do—to see just a few examples and come up with a solution—is in fact very difficult for a computer to do. The child opening the door for the adult instinctively understands the physics of the situation: There is a door, it has hinges, it can be pulled open, the adult trying to get inside the closet cannot simply walk through it. In addition to the physics the child understands, he is able to guess after a few attempts that the adult has an intention to go through the door but is failing.
This requires an understanding that human beings have plans and intentions and might want or need help to accomplish them. The capacity to learn a complex concept and also learn the specific conditions under which that concept is realized is an area where children exhibit natural, unsupervised mastery.
Infants like my own 9-month-old learn through interacting with the real world, which appears to be training various intuitive engines or simulators inside of her brain. One is a physics engine (to use Tenenbaum's term) that learns to understand—through piling up building blocks, knocking over cups, and falling off of chairs—how gravity, friction, and other Newtonian laws manifest in our lives and set parameters on what we can do.
In addition, infants from birth exhibit a social engine that recognizes faces, tracks gazes, and tries to understand how other social objects in the world think, behave, and interact with them and each other. This “social gating hypothesis,” proposed by Patricia Kuhl, professor of speech and hearing sciences at the University of Washington, argues that our ability to speak is fundamentally linked to the development of social understanding through our social interactions as infants. Elizabeth Spelke, a cognitive psychologist at Harvard University, and her collaborators have been working to show how infants develop a “intuitive psychology” to infer people’s goals from as early as 10 months.
In his book, Thinking, Fast and Slow, Daniel Kahneman explains that the intuitive part of our brain is not so good at statistics or math. He proposes the following problem. A baseball bat and a ball together cost $1.10. The bat costs $1 more than the ball. How much is the cost of the ball? Our intuition wants to say, 10 cents, but that’s wrong. If the ball is 10 cents, and the bat is $1 more, the bat would be $1.10, which would make the total $1.20. The correct answer is that the ball is 5 cents and the bat is $1.05, bringing the total to $1.10. Clearly, you can fool our intuition about statistics, just like the stacked rocks existing in the natural world confuse our internal physics engine.
But academics and economists often use such examples as reasons to undervalue the role of intuition in science and academic study, and that’s a huge mistake. The intuitive engines that help us quickly assess physical or social situations are doing extremely complex computations that may not even be explainable; it may be impossible to compute them linearly. For example, an expert skier can’t explain what she does, nor can you learn to ski just by reading instructions. Your brain and your whole body learn to move, synchronize, and operate in a very complex way to enter a state of flow where everything works without linear thinking.
Your brain goes through a tremendous transformation in your infancy. Infant brains initially grow twice as many connections between neurons as adults have, and these are pruned back as a child’s brain matures. Their brains develop an intuitive understanding of the complex systems they interact with—stairs, mom, dad, friends, cars, snowy mountains. Some will learn the difference between dozens of types of waves, to help them navigate the seas, or the difference between many types of snow. As the brain develops, it prunes the connections that don’t appear to be important as we mature.
While our ability to explain, argue, and understand each other using words is extremely important, it is also important to understand that words are simplified representations and can mean different things to different people. Many ideas or things that we know cannot be reduced to words; when they are, the words do not transmit more than a summary of the actual idea or understanding.
Just as we should not dismiss the expert skier who cannot explain how they ski, we should not dismiss the intuition of the shamans who hear nature telling them that things are out of balance. It may be that our view of many of the sensibilities of indigenous people and their relationships with nature as “primitive”—because they can’t explain it and we can’t understand—is in fact more about our lack of an environmental intuition engine. Our senses may have pruned those neurons because they weren’t needed in our urban worlds. We spend most of our lives with our noses in books and screens and sitting in cubicles becoming educated so that we understand the world. Does our ability to explain things mathematically or economically really mean that we understand things such as ecological systems better than the brains of those who were immersed in a natural environment from infancy, who understand them intuitively?
Maybe a big dose of humility and an effort to integrate the nonlinear and intuitive understanding of the minds of people we view as less educated—people who have learned through doing and observing instead of through textbooks—would substantially benefit our understanding of how things work and what we can do about the problems currently unsolvable with our modern tools. It’s also yet another argument for diversity. Reductionist mathematical and economic models are useful from an engineering point of view, but we should be mindful to appreciate our limited ability to describe complex adaptive systems using such models, which don’t really allow for intuition and run the risk of neglecting its role in human experience.
If Tenenbaum and his colleagues are successful in developing machines that can learn intuitive models of the world, it’s possible they will suggest things that either they can’t initially explain or that are so complex we are unable to comprehend them with current theories and tools. Whether we are talking about the push for more explainability in machine learning and AI models or we are trying to fathom how indigenous people interact with nature, we will reach the limits of explainability. It is this space, beyond the explainable, that is the exciting cutting edge of science, where we discover and press beyond our current understanding of the world.
The New Intelligence
- Deep learning has its limits—and its downsides.
- Google's AI has invented sounds previously unknown to human ears.
- AI had the potential to dramatically chance war—perhaps even more so than nukes.