Title: Social Inductive Biases for Decentralized Reinforcement Learning
Committee: Prof. Alex `Sandy' Pentland, MIT; Prof. Esteban Moro, Universidad Carlos III de Madrid; Dr. Tim Klinger, IBM; Thomas J. Watson Research Center; Prof. Neil Lawrence, The University of Sheffield
Abstract: How can we build machines that collaborate and learn more seamlessly with humans, and with each other? How do we create fairer societies? How do we minimize the impact of information manipulation campaigns, and fight back? How do we build machine learning algorithms that are more sample efficient when learning from each other's sparse data, and under time constraints? At the root of these questions is a simple one: how do agents, human or machines, learn from each other, and can we improve it and apply it to new domains? There is a growing movement to apply insights from the cognitive and social sciences to improving machine learning, as well as opportunities to use machine learning as a sandbox to test, simulate and expand ideas from the cognitive and social sciences. A less researched and fertile part of this intersection is the modeling of social learning: past work has been more focused on how agents can learn from the 'environment', and there is less work that borrows from both communities to look into how agents learn from each other. This thesis presents novel contributions into the nature and usefulness of social learning as an inductive bias for reinforced learning. I start by presenting the results from two large-scale online human experiments: first, I observe Dunbar cognitive limits that shape and limit social learning in two different social trading platforms, with the additional contribution that synthetic financial bots that transcend human limitations can obtain higher profits even when using naive trading strategies. Second, I devise a novel online experiment to observe how people, at the individual level, update their belief of future financial asset prices (e.g. S&P 500 and Oil prices) from social information. I model such social learning using Bayesian models of cognition, and observe that people make strong distributional assumptions on the social data they observe (e.g. assuming that the likelihood data is unimodal). I were fortunate to collect one round of predictions during the Brexit market instability, and find that social learning leads to higher performance than when learning from the underlying price history (the environment) during such volatile times. Having observed the cognitive limits and biases people exhibit when learning from other agents, I present an motivational example of the strength of inductive biases in reinforcement learning: I implement a learning model with a relational inductive bias that pre-processes the environment state into a set of relationships between entities in the world. I observe strong improvements in performance and sample efficiency, and even observe the learned relationships to be strongly interpretable. Finally, given that most modern deep reinforcement learning algorithms are distributed (in that they have separate learning agents), I investigate the hypothesis that viewing deep reinforcement learning as a social learning distributed search problem could lead to strong improvements. I do so by creating a fully decentralized, sparsely-communicating and scalable learning algorithm, and observe strong learning improvements with lower communication bandwidth usage (between learning agents) when using communication topologies that naturally evolved due to social learning in humans. Additionally, I provide a theoretical upper bound (that agrees with our empirical results) regarding which communication topologies lead to the largest learning performance improvement. Given a future increasingly filled with decentralized autonomous machine learning systems that interact with humans, there is an increasing need to understand social learning to build resilient, scalable and effective learning systems, and this thesis provides insights into how to build such systems.