Workshop
Learning and DecisionMaking with Strategic Feedback (StratML)
Yahav Bechavod · Hoda Heidari · Eric Mazumdar · Celestine MendlerDünner · Tijana Zrnic
Classical treatments of machine learning rely on the assumption that the data, after deployment, resembles the data the model was trained on. However, as machine learning models are increasingly used to make consequential decisions about people, individuals often react strategically to the deployed model. These strategic behaviorswhich effectively invalidate the predictive modelshave opened up new avenues of research and added new challenges to the deployment of machine learning algorithms in the real world.
Different aspects of strategic behavior have been studied by several communities both within and outside of machine learning. For example, the growing literature on strategic classification studies algorithms for finding strategyrobust decision rules, as well as the properties of such rules. Behavioral economics aims to understand and model people’s strategic responses. Recent works on learning in games study optimization algorithms for finding meaningful equilibria and solution concepts in competitive environments.
This workshop aims to create a dialogue between these different communities, all studying aspects of decisionmaking and learning with strategic feedback. The goal is to identify common points of interest and open problems in the different subareas, as well as to encourage crossdisciplinary collaboration.
Schedule
Tue 7:00 a.m.  7:10 a.m.

Opening remarks
(
Introduction
)
SlidesLive Video 
🔗 
Tue 7:10 a.m.  7:37 a.m.

Analysis and interventions in large network games: graphon games and graphon contagion
(
Invited Talk
)
SlidesLive Video Many of today’s most promising technological systems involve very large numbers of autonomous agents that influence each other and make strategic decisions within a network structure. Examples include opinion dynamics, targeted marketing in social networks, economic exchange and international trade, product adoption and social contagion. While traditional tools for the analysis of these systems assumed that a social planner has full knowledge of the network of interactions, when we turn to very large networks two issues emerge. First, collecting data about the exact network of interactions becomes very expensive or not at all possible because of privacy concerns. Second, methods for designing optimal interventions that rely on the exact network structure typically do not scale well with the population size. To obviate these issues, in this talk I will present a framework in which the social planner designs interventions based on probabilistic instead of exact information about agent’s interactions. I will introduce the tool of “graphon games” as a way to formally describe strategic interactions in this setting and I will illustrate how this tool can be exploited to design interventions. I will cover two main applications: targeted budget allocation and optimal seeding in contagion processes. I will illustrate how the graphon approach leads to interventions that are asymptotically optimal in terms of the population size and can be computed without requiring exact network data. 
Francesca Parise 🔗 
Tue 7:38 a.m.  8:05 a.m.

Closing the loop in Machine Learning: Learning to optimize with decision dependent data
(
Invited Talk
)
SlidesLive Video Learning algorithms are increasingly being deployed in a variety of real world systems with other autonomous decision processes and human decisionmakers. Importantly, in many settings humans react to the decisions algorithms make. This calls into question the following classically held tenet in supervised machine learning: when it is arduous to model a phenomenon, observations thereof are representative samples from some static or otherwise independent distribution. Without taking such reactions into consideration at the time of design, machine learning algorithms are doomed to result in unintended consequences such as reinforcing institutional bias or incentivizing gaming or collusion. In this talk, we discuss several directions of research along which we have made progress towards closing the loop in ML including robustness to model misspecification in capturing strategic behavior, decisiondependent learning in the presence of competition ('multiplayer performative prediction'), and dynamic decisiondependent learning wherein the data distribution may drift in time. Open questions will be posed towards the end of the talk. 
Lillian Ratliff 🔗 
Tue 8:05 a.m.  8:30 a.m.

Strategic Classification and the Quest for the Holy Grail
(
Invited Talk
)
SlidesLive Video Across a multitude of domains and applications, machine learning has become widespread as a tool for informing decisions about humans, and for humans. But most tools used in practice focus exclusively on mapping inputs to relevant outputs  and take no account of how humans respond to these outputs. This begs the question: how should we design learning systems when we know they will be used in social settings? The goal of this talk is to initiate discussion regarding this question and the paths we can take towards possible answers. Building on strategic classification as an appropriate first step, I will describe some of our work, both recent and current, that aims to extend strategic classification towards more realistic strategic settings that include more elaborate forms of economic modeling. Finally, I will argue for a broader view of how we can approach learning problems that lie just outside the scope of classic supervised learning. 
Nir Rosenfeld 🔗 
Tue 8:30 a.m.  9:00 a.m.

Panel 1
(
Discussion Panel
)
SlidesLive Video 
🔗 
Tue 9:00 a.m.  9:10 a.m.

The Platform Design Problem
(
Contributed talk
)
SlidesLive Video Online firms deploy suites of software platforms, where each platform is designed to interact with users during a certain activity, such as browsing, chatting, socializing, emailing, driving, etc. The economic and incentive structure of this exchange, as well as its algorithmic nature, have not been explored to our knowledge. We model this interaction as a Stackelberg game between a Designer and one or more Agents. We model an Agent as a Markov chain whose states are activities; we assume that the Agent’s utility is a linear function of the steadystate distribution of this chain. The Designer may design a platform for each of these activities/states; if a platform is adopted by the Agent, the transition probabilities of the Markov chain are affected, and so is the objective of the Agent. The Designer’s utility is a linear function of the steady state probabilities of the accessible states (that is, the ones for which the platform has been adopted), minus the development cost of the platforms. The underlying optimization problem of the Agent — that is, how to choose the states for which to adopt the platform — is an MDP. If this MDP has a simple yet plausible structure (the transition probabilities from one state to another only depend on the target state and the recurrent probability of the current state) the Agent’s problem can be solved by a greedy algorithm. The Designer’s optimization problem (designing a custom suite for the Agent so as to optimize, through the Agent’s optimum reaction, the Designer’s revenue), is in general NPhard to approximate within any finite ratio; however, in the special case, while still NPhard, has an FPTAS. These results generalize, under mild additional assumptions, from a single Agent to a distribution of Agents with finite support, as well as to the setting where other Designers have already created platforms, and the Designer must find the best response to the strategies of the other Designers. We discuss other implications of our results and directions of future research. 
Kiran Vodrahalli 🔗 
Tue 9:10 a.m.  9:20 a.m.

Learning Losses for Strategic Classification
(
Contributed talk
)
SlidesLive Video Strategic classification, i.e. classification under possible strategic manipulations of features, has received a lot of attention from both the machine learning and the game theory community. Most works focus on analysing the optimal decision rule under such manipulations. In our work we take a learning theoretic perspective, focusing on the sample complexity needed to learn a good decision rule which is robust to strategic manipulation. We perform this analysis under a strategic manipulation loss that takes into account both the accuracy of the final decision and the vulnerability to manipulation. We analyse the sample complexity for a known graph of possible manipulations in terms of the complexity of the function class and the manipulation graph. Additionally, we address the problem of unknown manipulation capabilities of the involved agents. Using techniques from transfer learning theory, we define a similarity measure for manipulation graphs and show that learning outcomes are robust with respect to small changes in the manipulation graph. Lastly we analyse the (sample complexity of) learning of the manipulation capability of agents with respect to this similarity measures, providing a way to learn strategic classification with respect to an unknown manipulation graph. 
Tosca Lechner 🔗 
Tue 9:20 a.m.  9:30 a.m.

Unfairness Despite Awareness: GroupFair Classification with Strategic Agents
(
Contributed talk
)
SlidesLive Video The use of algorithmic decision making systems in domains which impact the financial, social, and political wellbeing of people has created a demand for these decision making systems to be ``fair'' under some accepted notion of equity. This demand has in turn inspired a large body of work focused on the development of fair learning algorithms which are then used in lieu of their conventional counterparts. Most analysis of such fair algorithms proceeds from the assumption that the people affected by the algorithmic decisions are represented as immutable feature vectors. However, strategic agents may possess both the ability and the incentive to manipulate this observed feature vector in order to attain a more favorable outcome. We explore the impact that strategic agent behavior could have on fair classifiers and derive conditions under which this behavior leads to fair classifiers becoming less fair than their conventional counterparts under the same measure of fairness that the fair classifier takes into account. These conditions are related to the the way in which the fair classifier remedies unfairness on the original unmanipulated data: fair classifiers which remedy unfairness by becoming more selective than their conventional counterparts are the ones that become less fair than their counterparts when agents are strategic. We further demonstrate that both the increased selectiveness of the fair classifier, and consequently the loss of fairness, arises when performing fair learning on domains in which the advantaged group is overrepresented in the region near (and on the beneficial side of) the decision boundary of conventional classifiers. Finally, we observe experimentally, using several datasets and learning methods, that this \emph{fairness reversal} is common, and that our theoretical characterization of the fairness reversal conditions indeed holds in most such cases. 
Andrew Estornell 🔗 
Tue 9:30 a.m.  9:40 a.m.

Testoptional Policies: Overcoming Strategic Behavior and Informational Gaps
(
Contributed talk
)
SlidesLive Video Due to the Covid19 pandemic, more than 500 USbased colleges and universities went ``testoptional'' for admissions and promised that they would not penalize applicants for not submitting test scores, part of a longer trend to rethink the role of testing in college admissions. However, it remains unclear how (and whether) a college can simultaneously use test scores for those who submit them, while not penalizing those who do notand what that promise even means. We formalize these questions, and study how a college can overcome two challenges with optional testing: strategic applicants (when those with low test scores can pretend to not have taken the test), and informational gaps (it has more information on those who submit a test score than those who do not). We find that colleges can indeed do so, if and only if they are able to use information on who has test access and are willing to randomize admissions. 
Zhi Liu 🔗 
Tue 9:40 a.m.  10:40 a.m.

Poster Session ( Poster Session, social & break ) link  🔗 
Tue 10:45 a.m.  11:12 a.m.

Microfoundations of Algorithmic decisions
(
Invited Talk
)
SlidesLive Video When theorizing the causal effects that algorithmic decisions have on a population, an important modeling choice arises. We can model the change to a population in the aggregate, or we can model the response to a decision rule at the individual level. Standard economic microfoundations, for instance, ground the response in the utilitymaximizing behavior of individuals. Providing context from sociological and economic theory, I will argue why this methodological problem is of significant importance to machine learning. I will focus on the relationships and differences between two recent lines of work, called strategic classification and performative prediction. While performative prediction takes a macrolevel perspective on distribution shifts induced by algorithmic predictions, strategic classification builds on standard economic microfoundations. Based on work with Meena Jagadeesan and Celestine MendlerDünner, I will discuss the serious shortcomings of standard microfoundations in the context of machine learning and speculate about the alternatives that we have. 
Moritz Hardt 🔗 
Tue 11:12 a.m.  11:40 a.m.

Improving Information from Manipulable Data
(
Invited Talk
)
SlidesLive Video Databased decisionmaking must account for the manipulation of data by agents who are aware of how decisions are being made and want to affect their allocations. We study a framework in which, due to such manipulation, data becomes less informative when decisions depend more strongly on data. We formalize why and how a decisionmaker should commit to underutilizing data. Doing so attenuates information loss and thereby improves allocation accuracy. 
Navin Kartik 🔗 
Tue 11:40 a.m.  12:10 p.m.

Revisiting Dynamics in Strategic ML
(
Invited Talk
)
SlidesLive Video Strategic classification concerns the problem of training a classifier that will ultimately observe data generated according to strategic agents’ responses. The commonly adopted setting is that the agents are fully rational and can best respond to a classifier, and the classifier is aiming to maximize its robustness to the strategic “manipulations”. This talk revisits a couple of dynamics concepts in the above formulation. The first question we try to revisit is: are all changes considered undesirable? We observe that in many application settings, changes in agents’ profile X can lead to true improvement in their target variable Y [1,2]. This observation requires us to revisit the objective function of the learner, and study the possibility of inducing an improved population from the agents. The second question we revisit is: do agents respond rationally? Inspired by evolutionary game theory, we introduce a dynamical agent response model using replicator dynamics to model agents’ potentially nonfully rational responses to a sequence of classifiers [3]. We characterize the dynamics of this model and offer observations of its fairness implication in such a longterm dynamical environment. References: [1] Linear Classifiers that Encourage Constructive Adaptation, Yatong Chen, Jialu Wang and Yang Liu, 2021. [2] Induced Domain Adaptation, Yang Liu, Yatong Chen, Jiaheng Wei, 2021. [3] Unintended Selection: Persistent Qualification Rate Disparities and Interventions, Reilly Raab and Yang Liu, Neural Information Processing Systems (NeurIPS), 2021 
Yang Liu 🔗 
Tue 12:10 p.m.  12:40 p.m.

Panel 2
(
Discussion Panel
)
SlidesLive Video 
🔗 
Tue 12:40 p.m.  1:08 p.m.

Algorithmic Monoculture and Social Welfare
(
Invited Talk
)
SlidesLive Video As algorithms are increasingly applied to screen applicants for highstakes decisions in employment, education, lending, and other domains, concerns have been raised about the effects of "algorithmic monoculture", in which many decisionmakers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the risk of severe harm from unexpected shocks. We present a set of basic models characterizing the potential risks from algorithmic monoculture, showing that monocultural convergence on a single algorithm by a group of decisionmaking agents, even when the algorithm is more accurate for any one agent in isolation, can reduce the overall quality of the decisions being made by the full collection of agents. Our results rely on minimal assumptions, and involve a combination of gametheoretic arguments about competing decisionmakers with the development of a probabilistic framework for analyzing systems that use multiple noisy estimates of a set of alternatives. The talk is based on joint work with Manish Raghavan. 
Jon Kleinberg 🔗 
Tue 1:08 p.m.  1:31 p.m.

Leveraging strategic interactions for causal discovery
(
Invited Talk
)
SlidesLive Video Machine Learning algorithms often prompt individuals to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the predictive model is trained on may differ from the one it operates on in deployment. While such distribution shifts, in general, hinder accurate predictions, our work identifies a unique opportunity associated with shifts due to strategic responses. We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict. More specifically, we study a gametheoretic model in which a principal deploys a sequence of models to predict an outcome of interest (e.g., college GPA) for a sequence of strategic agents (e.g., college applicants). In response, strategic agents invest efforts and modify their features for better predictions. In such settings, unobserved confounding variables can influence both an agent's observable features (e.g., high school records) and outcomes. Therefore, standard regression methods generally produce biased estimators. To address this issue, our work establishes a novel connection between strategic responses to machine learning models and instrumental variable (IV) regression, by observing that the sequence of deployed models can be viewed as an instrument that affects agents' observable features but does not directly influence their outcomes. Therefore, twostage least squares (2SLS) regression can recover the causal relationships between observable features and outcomes. Beyond causal recovery, we can build on our 2SLS method to address two additional relevant optimization objectives: agent outcome maximization and predictive risk minimization. This work is joint with Keegan Harris, Daniel Ngo, Logan Stapleton, and Hoda Heidari. 
Steven Wu 🔗 
Tue 1:31 p.m.  2:00 p.m.

Online intermediation in legacy industries: The effect of reservation platforms on restaurants’ prices and survival
(
Invited Talk
)
SlidesLive Video Across multiple industries, new online platforms are interjecting themselves as digital intermediaries in previously direct businesstoconsumer transactions. A reasonable concern is that these platforms, once they become dominant, can leverage their unique position to extract surplus from both sides participating in the transaction and lead to different welfare outcomes than platform participants expected (and experienced) when they first joined the platform. We study the effects that OpenTable (an online restaurant reservation platform) had on restaurants’ prices and their likelihood of survival in NYC, during a period the platform expanded to cover most restaurants in the city. We develop an analytical model to understand restaurants’ adoption decision, and the effect of adoption on prices and consumer surplus. The model shows how the platform can induce a prisoner’s dilemma where restaurants have incentives to join the platform to poach customers from competitors or to protect its clientele from competitors. However, once all restaurants join, none of them will attract additional customers, and the costs of the platform will be passed down to diners through price increases. As the popularity of the platform grows, the platform can charge a higher fee to restaurants until extracting all the benefits it creates. To test the predictions of the model, we create a dataset containing prices, survival, and OpenTable participation for over 5,000 restaurants in NYC between 2005 and 2016. Our analysis suggests that as the platform became prevalent, the costs of the platform were passed down to consumers through price and restaurants saw no benefits in terms of survival. 
Cristobal Cheyre 🔗 
Tue 2:00 p.m.  2:30 p.m.

Panel 3
(
Discussion Panel
)
SlidesLive Video 
🔗 
Tue 2:30 p.m.  2:30 p.m.

Social
(
Social
)

🔗 


Efficient Competitions and Online Learning with Strategic Forecasters
(
Poster
)
Winnertakeall competitions in forecasting and machinelearning suffer from distorted incentives.Witkowski et al. identified this problem and proposed ELF, a truthful mechanism to select a winner.We show that, from a pool of $n$ forecasters, ELF requires $\Theta(n\log n)$ events or test data points to select a nearoptimal forecaster with high probability.We then show that standard online learning algorithms select an $\epsilon$optimal forecaster using only $O(\log(n) / \epsilon^2)$ events, by way of a strong approximatetruthfulness guarantee.This bound matches the best possible even in the nonstrategic setting.We then apply these mechanisms to obtain the first noregret guarantee for nonmyopic strategic experts.

Anish Thilagar · Rafael Frongillo · Bo Waggoner · Robert Gomez 🔗 


Alternative Microfoundations for Strategic Classification
(
Poster
)
When reasoning about strategic behavior in a machine learning context it is tempting to combine standard microfoundations of rational agents with the statistical decision theory underlying classification. In this work, we argue that a direct combination of these standard ingredients leads to brittle solution concepts of limited descriptive and prescriptive value. First, we show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule that we often do not observe empirically. Second, when any positive fraction of agents is not perfectly strategic, desirable stable pointswhere the classifier is optimal for the data it entailscease to exist. Third, optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden within a broad class of possible assumptions about agent behavior.Recognizing these limitations we explore alternatives to standard microfoundations for binary classification. To navigate the space of possible assumptions about how agents respond to a decision rule we specify a set of desiderata our model should satisfy and that help mitigate the limitations of the standard model. We propose noisy response as a promising candidate model. Inspired by smoothed analysis and empirical observations, noisy response incorporates natural imperfection in the agent responses. This model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality. 
Meena Jagadeesan · Celestine MendlerDünner · Moritz Hardt 🔗 


Estimation of Standard Asymmetric Auction Models
(
Poster
)
We provide efficient methods for learning strategic agents' underlying bid and value distributions by observing only the outcomes of their repeated interaction in a variety of standard auction models. In particular, given a finite set of observationseach only comprising the identity of the winner and the price they paidin a sequence of auctions involving the same set of ex ante asymmetric bidders with independent private values, we provide algorithms for nonparametrically estimating the bid distribution of each bidder to within Wasserstein, Kolmogorov, or total variation distance on the effective support of these distributions. We provide convergence bounds for the attained distance in terms of the number of observations, number of bidders, and other relevant parameters of the problem, which are uniform in that they do not depend on the bid distributions being estimated. For firstprice auctions (where bid distributions and equilibrium value distributions do not coincide) we also show provide finitesample estimation results for agents' value distributions at BayesNash equilibrium. Our estimation guarantees advance a body of work at the intersection of machine learning and econometrics with partial sample observability wherein only identification results have been previously obtained in our setting. 
Yeshwanth Cherapanamjeri · Constantinos Daskalakis · Andrew Ilyas · Emmanouil Zampetakis 🔗 


NearOptimal NoRegret Learning in General Games
(
Poster
)
We show that Optimistic Hedge  a common variant of multiplicativeweightsupdates with recency bias  attains ${\rm poly}(\log T)$ regret in multiplayer generalsum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after $T$ rounds of interaction, each player experiences total regret that is ${\rm poly}(\log T)$. Our bound improves, exponentially, the $O({T}^{1/2})$ regret attainable by standard noregret learners in games, the $O(T^{1/4})$ regret attainable by noregret learners with recency bias (Syrgkanis et al., 2015), and the ${O}(T^{1/6})$ bound that was recently shown for Optimistic Hedge in the special case of twoplayer games (Chen & Peng, 2020). A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of $\tilde{O}\left(\frac 1T\right)$.

Constantinos Daskalakis · Maxwell Fishelson · Noah Golowich 🔗 


Bounded Rationality for MultiAgent Motion Planning and Behavior Learning
(
Poster
)
When a robot shares the same workspace with other intelligent agents (e.g., other robots or humans), the robot must be able to reason the behaviors of its neighboring agents while accomplishing its designated task. We assume other agents do not explicitly share their decisions or reasoning processes with the robot. Thus the robot needs to reason itself for the consequences of all other agents, the process of which demands prohibitive computational resources. We observe that in practice, oftentimes predicting the absolutely optimal agent behaviors is not necessary. Instead, it is more desirable that the solution can be computed with a closetooptimal (suboptimal) but affordable (tractable) computation time. We propose to incorporate the concept of bounded rationality (from an informationtheoretic viewpoint) into the generalsum stochastic game setting. Specifically, the robot computes a policy under the information processing constraint, represented as KLdivergence between the default and optimized stochastic policies. The solution to the boundedoptimal policy can be obtained by an importance sampling approach. We show pilot results that this framework allows the robot to (1) effectively tradeoff between solution optimality and limited computational time; (2) efficiently learn the suboptimal behaviors of other agents. 
Junhong Xu · Kai Yin · Lantao Liu 🔗 


Information Discrepancy in Strategic Learning
(
Poster
)
We study the effects of information discrepancy across subpopulations on their ability to simultaneously improve their features in strategic learning settings. Specifically, we consider a game where a principal deploys a decision rule in an attempt to optimize the whole population's welfare, and agents strategically adapt to it to receive better scores. Inspired by reallife settings, such as loan approvals and college admissions, we remove the typical assumption made in the strategic learning literature that the decision rule is fully known to the agents, and focus on settings where it is inaccessible. In their lack of knowledge, individuals try to infer this rule by learning from their peers (e.g., friends and acquaintances who previously applied for a loan), naturally forming groups in the population, each with possibly different type and level of information about the decision rule. In our equilibrium analysis, we show that the principal's decision rule optimizing the welfare across subgroups} may cause a surprising negative externality; the true quality of some of the subgroups can actually deteriorate. On the positive side, we show that in many natural cases, optimal improvement is guaranteed simultaneously for all subgroups in equilibrium. We also characterize the disparity in improvements across subgroups via a measure of their informational overlap. Finally, we complement our theoretical analysis with experiments on realworld datasets. 
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani 🔗 


When to Call Your Neighbor? Strategic Communication in Cooperative Stochastic Bandits
(
Poster
)
In cooperative bandits, a framework that captures essential features of collective sequential decision making, agents can minimize group regret, and thereby improve performance, by leveraging shared information. However, sharing information can be costly, which motivates developing policies that minimize group regret while also reducing the number of messages communicated by agents. Existing cooperative bandit algorithms obtain optimal performance when agents share information with their neighbors at \textit{every time step}, i.e., full communication. This requires $\Theta(T)$ number of messages, where $T$ is the time horizon of the decision making process. We propose \textit{ComEx}, a novel costeffective communication protocol in which the group achieves the same order of performance as full communication while communicating only $O(\log T)$ number of messages. Our key step is developing a method to identify and only communicate the information crucial to achieving optimal performance. Further we propose novel algorithms for several benchmark cooperative bandit frameworks and show that our algorithms obtain \textit{stateoftheart} performance while consistently incurring a significantly smaller communication cost than existing algorithms.

Udari Madhushani · Naomi Leonard 🔗 


Gaming Helps! Learning from Strategic Interactions in Natural Dynamics
(
Poster
)
We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenariosfrom credit assessment to school admissionsposing a challenge for the learner. Surprisingly, we find that such strategic manipulations may in fact help the learner recover the meaningful variablesthat is, the features that, when changed, affect the true label (as opposed to nonmeaningful features that have no effect). We show that even simple behavior on the learner's part allows her to simultaneously i) accurately recover the meaningful features, and ii) incentivize agents to invest in these meaningful features, providing incentives for improvement. 
Yahav Bechavod · Katrina Ligett · Steven Wu · Juba Ziani 🔗 


Interactive Robust Policy Optimization for MultiAgent Reinforcement Learning
(
Poster
)
As machine learning is applied more to realworld problems like robotics, control of autonomous vehicles, drones, and recommendation systems, it becomes essential to consider the notion of agency where multiple agents with local observations start impacting each other and interact to achieve their goals. Multiagent reinforcement learning (MARL) is concerned with developing learning algorithms that can discover effective policies in multiagent environments. In this work, we develop algorithms for addressing two critical challenges in MARL  nonstationarity and robustness. We show that naive independent reinforcement learning does not preserve the strategic gametheoretic interaction between the agents, and we present a way to realize the classical infinite order recursion reasoning in a reinforcement learning setting. We refer to this framework as Interactive Policy Optimization (IPO) and derive four MARL algorithms using centralizedtrainingdecentralizedexecution that generalize the widely used singleagent policy gradient methods to multiagent settings. Finally, we provide a method to estimate opponent's parameters in adversarial settings using maximum likelihood and integrate IPO with an adversarial learning framework to train agents robust to destabilizing disturbances from the environment/adversaries and for better sim2real transfer from simulated multiagent environments to the real world. 
Videh Nema · Balaraman Ravindran 🔗 


Negotiating networks in oligopoly markets for price sensitive products
(
Poster
)
We present a novel framework to learn functions that estimate decisions of sellers and buyers simultaneously in a oligopoly market for a pricesensitive product. In this setting, the aim of the seller network is to come up with a price for a given context such that the expected revenue is maximised by considering the buyer's satisfaction as well. On the other hand, the aim of the buyer network is to assign probability of purchase to the offered price to mimic the real world buyers' responses while also showing price sensitivity through its action. In other words, rejecting the unnecessarily high priced products. Similar to generative adversarial networks, this framework corresponds to a minimax twoplayer game. In our experiments with simulated and realworld transaction data, we compared our framework with the baseline model and demonstrated its potential through proposed evaluation metrics. 
naman shukla 🔗 


Strategic clustering
(
Poster
)
This paper studies the problem of clustering through the lens of utility and welfare. In particular, we ask, how much does the quality of the clustering  typically measured by the conductance, or by the number of edges cut, or the average distance to the centers  deteriorate if the nodes are strategic and can change clusters? And among reasonable utilities for the nodes, which one hurts quality the least? We investigate these questions both theoretically, by studying the equilibria of hedonic games (simplified clustering games with unconstrained number of clusters), and experimentally, by measuring the quality of pure Nash equilibria of more realistic clustering games. We introduce a new utility function for the nodes which we call closeness, and which we believe is an attractive alternative to previously studied node utilities. We study the properties of the closeness utility theoretically and demonstrate experimentally its advantages over other established utilities such as the modified fractional utility. Finally, we present a polynomialtime algorithm which, given a clustering with optimal quality, finds another clustering with better average utility, and in fact the one that maximizes the ratio of the gain in average utility over the loss in quality. This submission and the topics covered are particularly related to the theme of the Strategic ML workshop, especially to welfareaware machine learning and modeling strategic behavior. In the case of clustering, this work aims to cover gaps between the classical ML algorithms for clustering that optimize certain connectivity metrics (such as conductance) and incentives that people have in creating or changing groups. With an increasing demand for algorithmic tools in identifying and facilitating groups, understanding the incentives people may have in group formation become important in order to ensure retention, stability, and success of groups. This work aims to open avenues of research for investigating tradeoffs between robustness and welfare in designing algorithms for clustering. 
AnaAndreea Stoica · Christos Papadimitriou 🔗 


One for One, or All for All: Equilibria and Optimality of Collaboration in Federated Learning
(
Poster
)
Federated learning is highly studied as an approach to collaboration across large populations, however, little has been explored in how to coordinate resources to maintain such data collaborations over time. Inspired by game theoretic notions, this paper introduces a framework for incentiveaware learning and data sharing in federated learning. Our notions of stable and envyfree equilibria formalize notions of fairness and stability for selfinterested agents. For example, in an envyfree equilibrium, no agent would wish to swap their sampling burden with any other agent and in a stable equilibrium, no agent would wish to unilaterally reduce their sampling burden. In addition to formalizing this framework, our contributions include characterizing the structural properties of such equilibria, proving when they exist, and showing how they can be computed. Furthermore, we show that, in the worst case, a $\Omega(\sqrt{k})$ gap exists between the sampleminimizing and equilibrium solutions across realistic learning models.

Richard Phillips · Han Shao · Avrim Blum · Nika Haghtalab 🔗 


Regret, stability, and fairness in matching markets with bandit learners
(
Poster
)
We consider the twosided matching market with bandit learners. In a matching market, there are two sets of agentsusers and providersthat seek to be matched onetoone. Contrary to a core assumption in the literature, users and providers often do not know their true matching preferences a priori and must learn them. To address this assumption, recent works propose to blend the matching and multiarmed bandit problems. One recent work finds that it is possible to assign matchings that are stable (i.e., no pair of agents is incentivized to defect) at every time step while also allowing agents to learn enough so that the system converges to matchings that are stable under the agents' true preferences. However, the authors also uncover an impossibility result: under their setup, one cannot simultaneously guarantee stability and low optimal regret. In this work, we study the twosided matching market with bandit learners under costs and transfers in order to faithfully model competition between agents. We show that, in contrast to previous work, it is possible to simultaneously guarantee four desiderata: (1) stability, (2) low optimal regret, (3) fairness in the distribution of regret, and (4) high social welfare. 
Sarah Cen · Devavrat Shah 🔗 


On classification of strategic agents who can both game and improve
(
Poster
)
In this work, we consider classification of agents who can both game and improve. For example, people wishing to get a loan may be able to take some actions that increase their perceived creditworthiness and others that also increase their true creditworthiness. A decisionmaker would like to define a classification rule with few falsepositives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each.For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partialinformation learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finitepoint version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NPhard in our full linear model. We additionally provide an algorithm that determines whether there exists a linear classifier that classifies all agents accurately and causes all improvable agents to become qualified, and give additional results for lowdimensional data. 
Saba Ahmadi · Hedyeh Beyhaghi · Avrim Blum · Keziah Naggita 🔗 


The Strategic Perceptron
(
Poster
)
The classical Perceptron algorithm provides a simple and elegant procedure for learning a linear classifier. In each step, the algorithm observes the sample’s position and label and updates the current predictor accordingly if it makes a mistake. However, in presence of strategic agents that desire to be classified as positive and that are able to modify their position by a limited amount, the classifier may not be able to observe the true position of agents but rather a position where the agent pretends to be. Unlike the original setting with perfect knowledge of positions, in this situation the Perceptron algorithm fails to achieve its guarantees, and we illustrate examples with the predictor oscillating between two solutions forever, making an unbounded number of mistakes even though a perfect largemargin linear classifier exists. Our main contribution is providing a modified Perceptronstyle algorithm which makes a bounded number of mistakes in presence of strategic agents with both $\ell_2$ and weighted $\ell_1$ manipulation costs. In our baseline model, knowledge of the manipulation costs (i.e., the extent to which an agent may manipulate) is assumed. In our most general model, we relax this assumption and provide an algorithm which learns and refines both the classifier and its cost estimates to achieve good mistake bounds even when manipulation costs are unknown.

Saba Ahmadi · Hedyeh Beyhaghi · Avrim Blum · Keziah Naggita 🔗 


Price Discovery and Efficiency in Waiting Lists: A Connection to Stochastic Gradient Descent
(
Poster
)
Waiting lists allocate items by offering agents a choice among items with associated waiting times. These waiting times serve as prices that are formed by the stochastic arrivals and departures of agents. We study the allocative efficiency under such dynamically adjusting prices by drawing a connection between this price adjustment process and the stochastic gradient descent optimization algorithm. We show that the loss due to price fluctuations is bounded by the granularity of price changes. Additional conditions allow us to identify markets where the loss is close to the bound or exponentially small. Our results show that a simple price adjustment heuristic can perform well, but may be slow to adjust to changes in demand because of a tradeoff between the speed of adaptation and loss from price fluctuations. 
Itai Ashlagi · Jacob Leshno · Pengyu Qian · Amin Saberi 🔗 


Unfairness Despite Awareness: GroupFair Classification with Strategic Agents
(
Poster
)
The use of algorithmic decision making systems in domains which impact the financial, social, and political wellbeing of people has created a demand for these decision making systems to be ``fair'' under some accepted notion of equity. This demand has in turn inspired a large body of work focused on the development of fair learning algorithms which are then used in lieu of their conventional counterparts. Most analysis of such fair algorithms proceeds from the assumption that the people affected by the algorithmic decisions are represented as immutable feature vectors. However, strategic agents may possess both the ability and the incentive to manipulate this observed feature vector in order to attain a more favorable outcome. We explore the impact that strategic agent behavior could have on fair classifiers and derive conditions under which this behavior leads to fair classifiers becoming less fair than their conventional counterparts under the same measure of fairness that the fair classifier takes into account. These conditions are related to the the way in which the fair classifier remedies unfairness on the original unmanipulated data: fair classifiers which remedy unfairness by becoming more selective than their conventional counterparts are the ones that become less fair than their counterparts when agents are strategic. We further demonstrate that both the increased selectiveness of the fair classifier, and consequently the loss of fairness, arises when performing fair learning on domains in which the advantaged group is overrepresented in the region near (and on the beneficial side of) the decision boundary of conventional classifiers. Finally, we observe experimentally, using several datasets and learning methods, that this \emph{fairness reversal} is common, and thatour theoretical characterization of the fairness reversal conditions indeed holds in most such cases. 
Andrew Estornell · Sanmay Das · Yang Liu · Yevgeniy Vorobeychik 🔗 


Learning Losses for Strategic Classification
(
Poster
)
Strategic classification, i.e. classification under possible strategic manipulations of features, has received a lot of attention from both the machine learning and the game theory community. Most works focus on analysing the optimal decision rule under such manipulations. In our work we take a learning theoretic perspective, focusing on the sample complexity needed to learn a good decision rule which is robust to strategic manipulation. We perform this analysis under a strategic manipulation loss that takes into account both the accuracy of the final decision and the vulnerability to manipulation. We analyse the sample complexity for a known graph of possible manipulations in terms of the complexity of the function class and the manipulation graph. Additionally, we address the problem of unknown manipulation capabilities of the involved agents. Using techniques from transfer learning theory, we define a similarity measure for manipulation graphs and show that learning outcomes are robust with respect to small changes in the manipulation graph. Lastly we analyse the (sample complexity of) learning of the manipulation capability of agents with respect to this similarity measures, providing a way to learn strategic classification with respect to an unknown manipulation graph. 
Tosca Lechner · Ruth Urner 🔗 


Normative disagreement as a challenge for Cooperative AI
(
Poster
)
Cooperation in settings where agents have both common and conflicting interests (mixedmotive environments) has recently received considerable attention in multiagent learning. However, the mixedmotive environments typically studied have a single cooperative outcome on which all agents can agree. Many realworld multiagent environments are instead bargaining problems (BPs): they have several Paretooptimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperationinducing learning algorithms fail to cooperate in BPs when there is room for \textit{normative disagreement} resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of \textit{normadaptive} policies. Normadaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of normadaptive policies and show in experiments that these significantly increase cooperation. However, normadaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness. 
Julian Stastny · Maxime Riché · Aleksandr Lyzhov · Johannes Treutlein · Allan Dafoe · Jesse Clifton 🔗 


Learning through Recourse under Censoring
(
Poster
)
Machine learning models are often deployed in dynamic systems that involve data collection and model updates. In consumer finance, for example, lenders deploy a model to automate lending decisions, collect data from applicants, and then use it to update the model for future applicants. In such systems, data collection suffers from \emph{selective labeling}, as we only observe outcomes for applicants who are approved. When these systems are initialized with models deny loans to a specific group, they exhibit \emph{censoring} whereby they deny loans to a group of consumers in perpetuity.In this work, we identify conditions when machine learning systems exhibit censoring. We study the ability to resolve censoring by providing \emph{recourse}  i.e., by providing applicants who are denied with actions that guarantee approval. We develop a method to learn linear classifiers with recourse guarantees, which allows model owners to update their models while providing prior applicants with a guarantee of approval. We benchmark our method and other strategies for exploration in their ability to resolve censoring. Our results illustrate the costs of each strategy on key stakeholders, provide insight into failure modes that lead to censoring, and highlight the importance of the feasibility of recourse. 
Jennifer Chien · Berk Ustun · Margaret Roberts 🔗 


Pessimistic Offline Reinforcement Learning with Multiple Agents
(
Poster
)
We study offline multiagent reinforcement learning (RL), which aims to learn an optimal policy based on historical data for multiple agents. Unlike online RL, accidental overestimation errors arising from function approximation can cumulatively arise and affect future iterations in the offline RL. In this paper, we extend the pessimistic value iteration algorithm to the multiagent setting: after obtaining the lower bound of the value function of each agent, we compute the optimistic policy by solving a generalsum matrix game with these lower bounds as the payoff matrices. Instead of finding the Nash Equilibrium of such a generalsum game, our algorithm solves for a Coarse Correlated Equilibrium (CCE) for the sake of computational efficiency. In the end, we prove an informationtheoretic lower bound of the regret for any algorithm on an offline dataset in the twoagent zerosum game. To the best of our knowledge, such a CCE scheme for pessimismbased algorithm has not appeared in the literature and can be of interest in its own right. We hope that our work can shed light on the future analysis of the equilibrium point of a multiagent Markov decision process given an offline dataset. 
Yihang Chen 🔗 


Promoting Resilience of MultiAgent Reinforcement Learning via ConfusionBased Communication
(
Poster
)
Agents operating in realworld settings are often faced with the need to adapt to unexpected changes in their environment. Recent advances in multiagent reinforcement learning (MARL) provide a variety of tools to support the ability of RL agents to deal with the dynamic nature of their environment, which may often be increased by the presence of other agents. In this work, we measure the resilience of a group of agents as the group’s ability to adapt to unexpected perturbations in the environment. To promote resilience, we suggest facilitating collaboration within the group, and offer a novel confusionbased communication protocol that requires an agent to broadcast its local observations that are least aligned with its previous experience. We present empirical evaluation of our approach on a set of simulated multitaxi settings. 
Ofir Abu · Sarah Keren · Matthias Gerstgrasser · Jeffrey S Rosenschein 🔗 


Game Redesign in Noregret Game Playing
(
Poster
)
We study the game redesign problem in which an external designer has the ability to change the payoff function in each round, but incurs a design cost for deviating from the original game. The players apply noregret learning algorithms to repeatedly play the changed games with limited feedback. The goals of the designer are to (i) incentivize all players to take a specific target action profile frequently; and (ii) incur small cumulative design cost. We present game redesign algorithms with the guarantee that the target action profile is played in T−o(T) rounds while incurring only o(T) cumulative design cost. Game redesign describes both positive and negative applications: a benevolent designer who incentivizes players to take a target action profile with better social welfare compared to the solution of the original game, or a malicious attacker whose target action profile benefits themselves but not the players. Simulations on four classic games illustrate our proposed redesign algorithms. 
Yuzhe Ma · Young Wu · Jerry Zhu 🔗 


PseudoCompetitive Games and Algorithmic Pricing
(
Poster
)
We study a new game of price competition amongst firms selling identical goods, whose defining property is that the revenue of a firm at a price is independent of any competing prices that are strictly lower. This game is motivated by a variety of customer choice models that induce this property, prominently those stemming from the wellknown satisficing heuristic for decisionmaking. Under a mild condition, we show that every purestrategy local Nash equilibrium in this game corresponds to a set of prices generated by the firms sequentially setting local bestresponse prices in a fixed order. In other words, despite being simultaneousmove games, equilibria have a sequentialmove equilibrium structure. We thus call these games pseudocompetitive games. We find that these games are often plagued by strictlylocal Nash equilibria, in which the price of a firm is only a local bestresponse to the competitor's price, when a globally optimal response with a potentially unboundedly higher payoff is available. We show that price dynamics resulting from the firms utilizing gradientbased pricing algorithms may often converge to such undesirable outcomes. We propose a new online learning approach to address this concern under certain regularity assumptions on the revenue curves. 
Chamsi Hssaine · Vijay Kamble · Siddhartha Banerjee 🔗 


Learning in Matrix Games can be Arbitrarily Complex
(
Poster
)
A growing number of machine learning architectures, such as Generative Adversarial Networks, rely on the design of games which implement a desired functionality via a Nash equilibrium. In practice these games have an implicit complexity (e.g.~from underlying datasets and the deep networks used) that makes directly computing a Nash equilibrium impractical or impossible. For this reason, numerous learning algorithms have been developed with the goal of iteratively converging to a Nash equilibrium. Unfortunately, the dynamics generated by the learning process can be very intricate and instances of training failure hard to interpret. In this paper we show that, in a strong sense, this dynamic complexity is inherent to games. Specifically, we prove that replicator dynamics, the continuoustime analogue of Multiplicative Weights Update, even when applied in a very restricted class of gamesknown as finite matrix gamesis rich enough to be able to approximate arbitrary dynamical systems. Our results are positive in the sense that they show the nearly boundless dynamic modelling capabilities of current machine learning practices, but also negative in implying that these capabilities may come at the cost of interpretability. As a concrete example, we show how replicator dynamics can effectively reproduce the wellknown strange attractor of Lonrenz dynamics (the "butterfly effect") while achieving no regret. 
Gabriel Andrade · Rafael Frongillo · Georgios Piliouras 🔗 


Testoptional Policies: Overcoming Strategic Behavior and Informational Gaps
(
Poster
)
Due to the Covid19 pandemic, more than 500 USbased colleges and universities went ``testoptional'' for admissions and promised that they would not penalize applicants for not submitting test scores, part of a longer trend to rethink the role of testing in college admissions. However, it remains unclear how (and whether) a college can simultaneously use test scores for those who submit them, while not penalizing those who do notand what that promise even means. We formalize these questions, and study how a college can overcome two challenges with optional testing: \textit{strategic applicants} (when those with low test scores can pretend to not have taken the test), and \textit{informational gaps} (it has more information on those who submit a test score than those who do not). We find that colleges can indeed do so, if and only if they are able to use information on who has test access and are willing to randomize admissions. 
Zhi Liu · Nikhil Garg 🔗 


Models of fairness in federated learning
(
Poster
)
In many realworld situations, data is distributed across multiple locations and can't be combined for training. For example, multiple hospitals might have comparable data related to patient outcomes. Federated learning is a novel distributed learning approach that allows multiple federating agents to jointly learn a model. While this approach might reduce the error each agent experiences, it also raises questions of fairness: to what extent can the error experienced by one agent be significantly lower than the error experienced by another agent? In this work, we consider two notions of fairness that each may be appropriate in different circumstances: \emph{egalitarian fairness} (which aims to bound how dissimilar error rates can be) and \emph{proportional fairness} (which aims to reward players for contributing more data). For egalitarian fairness, we obtain a tight multiplicative bound on how widely error rates can diverge between agents federating together. For proportional fairness, we show that subproportional error (relative to the number of data points contributed) is guaranteed for any individually rational federating coalition. 
Kate Donahue · Jon Kleinberg 🔗 


Improving Robustness of Malware Classifiers using Adversarial Strings Generated from Perturbed Latent Representations
(
Poster
)
In malware behavioral analysis, the list of accessed and created files very often indicates whether the examined file is malicious or benign. However, malware authors are trying to avoid detection by generating random filenames and/or modifying used filenames with new versions of the malware. These changes represent realworld adversarial examples. The goal of this work is to generate realistic adversarial examples and improve the classifier's robustness against these attacks. Our approach learns latent representations of input strings in an unsupervised fashion and uses gradientbased adversarial attack methods in the latent domain to generate adversarial examples in the input domain. We use these examples to improve the classifier's robustness by training on the generated adversarial set of strings. Compared to classifiers trained only on perturbed latent vectors, our approach produces classifiers that are significantly more robust without a large tradeoff in standard accuracy. 
Marek Galovic · Branislav Bosansky · Viliam Lisy 🔗 


Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization
(
Poster
)
In an effort to improve the accuracy of credit lending decisions, many financial intuitions are now using predictions from machine learning models. While such predictions enjoy many advantages, recent research has shown that the predictions have the potential to be biased and unfair towards certain subgroups of the population. To combat this, several techniques have been introduced to help remove the bias and improve the overall fairness of the predictions. We introduce a new fairness technique, called Subgroup Threshold Optimizer (STO), that does not require any alternations to the input training data nor does it require any changes to the underlying machine learning algorithm, and thus can be used with any existing machine learning pipeline. STO works by optimizing the classification thresholds for individual subgroups in order to minimize the overall discrimination score between them. Our experiments on a realworld credit lending dataset show that STO can reduce gender discrimination by over 90%. 
Cecilia Ying · Stephen Thomas 🔗 


The Platform Design Problem
(
Poster
)
Online firms deploy suites of software platforms, where each platform is designed to interact with users during a certain activity, such as browsing, chatting, socializing, emailing, driving, etc. The economic and incentive structure of this exchange, as well as its algorithmic nature, have not been explored to our knowledge. We model this interaction as a Stackelberg game between a Designer and one or more Agents. We model an Agent as a Markov chain whose states are activities; we assume that the Agent’s utility is a linear function of the steadystate distribution of this chain. The Designer may design a platform for each of these activities/states; if a platform is adopted by the Agent, the transition probabilities of the Markov chain are affected, and so is the objective of the Agent. The Designer’s utility is a linear function of the steady state probabilities of the accessible states (that is, the ones for which the platform has been adopted), minus the development cost of the platforms. The underlying optimization problem of the Agent — that is, how to choose the states for which to adopt the platform — is an MDP. If this MDP has a simple yet plausible structure (the transition probabilities from one state to another only depend on the target state and the recurrent probability of the current state) the Agent’s problem can be solved by a greedy algorithm. The Designer’s optimization problem (designing a custom suite for the Agent so as to optimize, through the Agent’s optimum reaction, the Designer’s revenue), is in general NPhard to approximate within any finite ratio; however, in the special case, while still NPhard, has an FPTAS. These results generalize, under mild additional assumptions, from a single Agent to a distribution of Agents with finite support, as well as to the setting where other Designers have already created platforms, and the Designer must find the best response to the strategies of the other Designers. We discuss other implications of our results and directions of future research. 
Christos Papadimitriou · Kiran Vodrahalli · Mihalis Yannakakis 🔗 


Strategic Classification in the Dark
(
Poster
)
Strategic classification studies the interaction between a classification rule and the strategic agents it governs. Under the assumption that the classifier is known, rational agents respond to it by manipulating their features. However, in many reallife scenarios of highstake classification (e.g., credit scoring), the classifier is not revealed to the agents, which leads agents to attempt to learn the classifier and game it too. In this paper we generalize the strategic classification model to such scenarios. We define the ``price of opacity'' as the difference in prediction error between opaque and transparent strategyrobust classifiers, characterize it, and give a sufficient condition for this price to be strictly positive, in which case transparency is the recommended policy. Our experiments show how Hardt et al.’s robust classifier is affected by keeping agents in the dark. 
Ganesh Ghalme · Vineet Nair · Itay Eilat · Inbal TalgamCohen · Nir Rosenfeld 🔗 


Scoring Rules for Performative Binary Prediction
(
Poster
)
We construct a model of expert prediction where predictions can influence the state of the world. Under this model, we show through theoretical and numerical results that proper scoring rules can incentivize experts to manipulate the world with their predictions. We also construct a simple class of scoring rules that avoids this problem. 
Alan Chan 🔗 


Approximating Bayes Nash Equilibria in Auction Games via Gradient Dynamics
(
Poster
)
Auctions are modeled as Bayesian games with continuous type and action spaces.Computing equilibria in games is computationally hard in general and no exactsolution theory is known for auction games. Recent research aimed at learningpure strategy Bayes Nash equilibria in symmetric auction games. In contrast, weintroduce algorithms computing distributional strategies on a discretized versionof the game via convex online optimization. First, existence results for purestrategy Bayes Nash equilibria are more restricted than those for equilibria indistributional strategies. Second, the expected utility of agents is linear indistributional strategies. We show that if regularized convex online optimizationalgorithms converge in such games, then they converge to a distributional εequilibrium of the discretized game. Importantly, we prove that a distributional εequilibrium of the discretized game approximates an equilibrium in the continuousgame. In a large number of experiments, we show that the method approximatesthe analytical (pure) Bayes Nash equilibrium arbitrarily closely in a wide varietyof auction games. The method allows for interdependent valuations and differenttypes of utility functions and provides a foundation for broadly applicableequilibrium solvers. For small environments with only a few players and items,the techniques is much faster, and can solve equilibria in symmetric auction gamesin seconds. 
Maximilian Fichtl · Matthias Oberlechner · Martin Bichler 🔗 


Strategic Classification Made Practical
(
Poster
)
Strategic classification regards the problem of learning in settings where users can strategically modify their features to improve outcomes. This setting applies broadly and has received much recent attention. But despite its practical significance, work in this space has so far been predominantly theoretical. In this paper we present a learning framework for strategic classification that is practical. Our approach directly minimizes the “strategic” empirical risk, achieved by differentiating through the strategic response of users. This provides flexibility that allows us to extend beyond the original problem formulation and towards more realistic learning scenarios. A series of experiments demonstrates the effectiveness of our approach on various learning settings. 
Nir Rosenfeld · Sagi Levanon 🔗 


Nash Convergence of MeanBased Learning Algorithms in First Price Auctions
(
Poster
)
We consider repeated first price auctions where each bidder, having a deterministic type, learns to bid using a meanbased learning algorithm. We completely characterize the Nash convergence property of the bidding dynamics in two senses: (1) timeaverage: the fraction of rounds where bidders play a Nash equilibrium approaches to 1 in the limit; (2) lastiterate: the mixed strategy profile of bidders approaches to a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value:if the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in timeaverage and in lastiterate. if the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in timeaverage but not necessarily in lastiterate. if the number is one, the bidding dynamics may not converge to a Nash equilibrium in timeaverage nor in lastiterate. Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms. 
Xiaotie Deng · Xinyan Hu · Tao Lin · Weiqiang Zheng 🔗 


Bayesian Persuasion for Algorithmic Recourse
(
Poster
)
When faced with (automated) assessment rules, individuals can modify their observable features strategically to obtain better decisions. In many situations, decisionmakers deliberately keep the underlying assessment rule secret to avoid gaming. This forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decisionmaker sends a signal, i.e., an action recommendation, to the decision subject to incentivize them to take desirable actions. We formulate the principal's problem of finding the optimal Bayesian incentivecompatible signaling policy as an optimization problem and characterize it via a linear program. Through this characterization, we observe that while finding a BIC strategy can be simplified dramatically, the computational complexity of solving this linear program is closely tied to (1) the relative size of the agent's action space, and (2) the number of features utilized by the underlying decision rule. 
Keegan Harris · Valerie Chen · Joon Kim · Ameet S Talwalkar · Hoda Heidari · Steven Wu 🔗 


RewardFree Attacks in MultiAgent Reinforcement Learning
(
Poster
)
We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to disrupt realworld RL applications, such as autonomous vehicles or delivery drones, without knowing the victim's precise goals or reward function. We argue that one heuristic approach an attacker can use is to strategically maximize the entropy of the victim's policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a rewardfree exploration algorithm that maximizes the attacker's entropy during the exploration phase, and it then maximizes the victim's empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim’s reward to succeed. Hence, even if the victim's reward information is protected, rewardfree attacks, based only on observing behavior, underscore the need to better understand policy obfuscation when preparing to deploy reinforcement learning in real world applications. 
Ted Fujimoto · Tim Doster · Adam Attarian · Jill Brandenberger · Nathan Hodas 🔗 


Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments
(
Poster
)
We study blackbox reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori. That is, our attack makes minimum assumptions on the prior knowledge of the adversary: it has no initial knowledge of the environment or the learner, and neither does it observe the learner's internal mechanism except for its performed actions. We design a novel blackbox attack, U2, that can provably achieve a nearmatching performance to the stateoftheart whitebox attack, demonstrating the feasibility of reward poisoning even in the most challenging blackbox setting. 
Amin Rakhsha · Xuezhou Zhang · Jerry Zhu · Adish Singla 🔗 


Global Convergence of MultiAgent Policy Gradient in Markov Potential Games
(
Poster
)
Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multiagent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multiagent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multiagent coordination. Counterintuitively, insights from normalform potential games do not carry over as MPGs can consist of settings where stategames can be zerosum games. In the opposite direction, Markov games where every stategame is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multiagent learning settings. 
Stefanos Leonardos · Will Overman · Ioannis Panageas · Georgios Piliouras 🔗 


ExplorationExploitation in MultiAgent Competition: Convergence with Bounded Rationality
(
Poster
)
The interplay between exploration and exploitation in competitive multiagent learning is still far from being well understood. Motivated by this, we study smooth Qlearning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Qlearning always converges to the unique quantalresponse equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zerosum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [15,34], we show that fast convergence of Qlearning in competitive settings obtains regardless of the number of agents and without any need for parameter finetuning. As showcased by our experiments in network zerosum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multiagent settings. 
Stefanos Leonardos · Kelly Spendlove · Georgios Piliouras 🔗 


Exploration and Incentives in Reinforcement Learning
(
Poster
)
How do you incentivize selfinterested agents to $\text{\emph{explore}}$ when they prefer to $\text{\emph{exploit}}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously.

Max Simchowitz · Aleksandrs Slivkins 🔗 


Timing is Money: The Impact of Arrival Order in BetaBernoulli Prediction Markets
(
Poster
)
Prediction markets are incentivebased mechanisms for eliciting and combining the diffused, private beliefs of traders about a future uncertain event such as a political election. Typically prediction markets maintain point estimates of forecast variables; however, exponentialfamily prediction markets define a class of cost functionbased marketmaking algorithms that maintain a complete, collective belief distribution over the underlying generative process of the event of interest (e.g. the probability density of the winner's voteshare). In this paper, we focus on concretizing a special case of this abstract framework we call the betaBernoulli market. We set up a multiagent simulation of the market ecosystem to experimentally investigate the interaction of this marketmaking algorithm with a heterogeneous trading population. We design a Bayesian trader model with explicit characterization of this heterogeneity with respect to two independent attributes: how rich a trader's private information is and how much wealth they initially have at their disposal. We are interested in gauging the interplay of the above attributes with the arrival order of traders, particularly in terms of the net profit accrued by different trader types. Our results strongly suggest that early arrival can dominate both wealth and informativeness as a factor in determining trader compensation under a variety of experimental conditions. 
Blake Martin · Mithun Chakraborty · Sindhu Kutty 🔗 


The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
(
Poster
)
We consider \emph{incentivized exploration}: a version of multiarmed bandits where the choice of arms is controlled by selfinterested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily large depending on the Bayesian priors, and scale exponentially in the number of arms. A more basic problem of sampling each arm once runs into similar factors.We focus on the \emph{price of incentives}: the loss in performance, broadly construed, incurred for the sake of incentivecompatibility. We prove that Thompson Sampling, a standard bandit algorithm, is incentivecompatible if initialized with sufficiently many data points. The performance loss due to incentives is therefore limited to the initial rounds when these data points are collected.The problem is largely reduced to that of sample complexity: how many rounds are needed? We address this question, providing matching upper and lower bounds and instantiating them in various corollaries. Typically, the optimal sample complexity is polynomial in the number of arms and exponential in the ``strength of beliefs". 
Mark Sellke · Aleksandrs Slivkins 🔗 


Coopetition Against an Amazon
(
Poster
)
This paper studies cooperative datasharing between competitors vying to predict a consumer's tastes. We design optimal datasharing schemes both for when firms compete only with each other, and for when they additionally compete with an Amazona company with more, better data. We derive conditions under which competitors benefit from coopetition, and under which full datasharing is optimal. 
Ronen Gradwohl · Moshe Tennenholtz 🔗 