Timezone: »
Policy gradient is one of the most famous algorithms in reinforcement learning. In this paper, we derive the mean dynamics of the soft-max policy gradient algorithm in multi-agent settings by resorting to evolutionary game theory tools. Studying its dynamics is crucial to understand the algorithm's weaknesses and suggest how to recover from them. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics are slight variants of the replicator dynamics not affecting the properties of the original dynamics, the soft-max policy gradient dynamics present a different structure. However, they preserve a close connection with replicator dynamics, being a replicator dynamics applied to a non-linear transformation of the fitness function. We separately analyze the dynamics when learning the best response from the cases of single- and multi-population games. In particular, we show that the soft-max policy gradient dynamics always converge to the best response. However, differently from the replicator dynamics, they always suffer from a non-empty space of bad initializations from which the convergence of the dynamics to the best response is not monotonic. Furthermore, in single- and multi-population games, we show that the soft-max policy gradient dynamics satisfy a weaker set of properties than those satisfied by replicator dynamics.
Author Information
Martino Bernasconi (Politecnico di Milano)
Federico Cacciamani (Politecnico di Milano)
Simone Fioravanti (Gran Sasso Science Institute (GSSI))
Nicola Gatti (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)
More from the Same Authors
-
2021 : Public Information Representation for Adversarial Team Games »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : A General Framework for Safe Decision Making: A Convex Duality Approach »
Martino Bernasconi · Federico Cacciamani · Nicola Gatti · Francesco Trovò -
2022 : A Unifying Framework for Online Safe Optimization »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti -
2022 Poster: Sequential Information Design: Learning to Persuade in the Dark »
Martino Bernasconi · Matteo Castiglioni · Alberto Marchesi · Nicola Gatti · Francesco Trovò -
2022 Poster: A Unifying Framework for Online Optimization with Long-Term Constraints »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti -
2022 Poster: Subgame Solving in Adversarial Team Games »
Brian Zhang · Luca Carminati · Federico Cacciamani · Gabriele Farina · Pierriccardo Olivieri · Nicola Gatti · Tuomas Sandholm -
2021 : Spotlight Talk: Public Information Representation for Adversarial Team Games »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2021 Poster: Exploiting Opponents Under Utility Constraints in Sequential Games »
Martino Bernasconi · Federico Cacciamani · Simone Fioravanti · Nicola Gatti · Alberto Marchesi · Francesco Trovò -
2020 Poster: Online Bayesian Persuasion »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Nicola Gatti -
2020 Poster: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium »
Andrea Celli · Alberto Marchesi · Gabriele Farina · Nicola Gatti -
2020 Spotlight: Online Bayesian Persuasion »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Nicola Gatti -
2020 Oral: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium »
Andrea Celli · Alberto Marchesi · Gabriele Farina · Nicola Gatti -
2019 Poster: Learning to Correlate in Multi-Player General-Sum Sequential Games »
Andrea Celli · Alberto Marchesi · Tommaso Bianchi · Nicola Gatti -
2018 Poster: Practical exact algorithm for trembling-hand equilibrium refinements in games »
Gabriele Farina · Nicola Gatti · Tuomas Sandholm -
2018 Poster: Ex ante coordination and collusion in zero-sum multi-player extensive-form games »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm