Causal Influence Intrinsic Social Motivation for Multi-Agent Reinforcement Learning


Teaching multiple AI agents to coordinate their behavior represents a challenging task, that can be difficult to achieve without training all agents with a centralised controller, or allowing agents to view each others' reward functions. We present a new approach to multi-agent reinforcement learning (MARL), in which agents are given an incentive for being able to causally influence each others' actions. Causal influence is assessed using counterfactual reasoning. We show that this social influence reward gives rise to more coordinated behavior, better collective outcomes, and even emergent communication. In fact, the influence reward can be learn to train agents to use an explicit communication protocol in a meaningful way, when they cannot learn to do this under normal circumstances. Finally, we show that this reward can be computed by training each agent to model the actions of other agents. An agent can then "imagine" counterfactual actions it could have taken, and predict how this would have affected other agents behavior, thus computing its own influence reward. This mechanism allows each agent to be trained independently, representing a significant improvement over prior MARL work.