Timezone: »

Learning in the Presence of Strategic Behavior
Nika Haghtalab · Yishay Mansour · Tim Roughgarden · Vasilis Syrgkanis · Jennifer Wortman Vaughan

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 101 A
Event URL: https://www.cs.cmu.edu/~nhaghtal/mlstrat/ »

Machine learning is primarily concerned with the design and analysis of algorithms that learn about an entity. Increasingly more, machine learning is being used to design policies that affect the entity it once learned about. This can cause the entity to react and present a different behavior. Ignoring such interactions could lead to solutions that are ultimately ineffective in practice. For example, to design an effective ad display one has to take into account how a viewer would react to the displayed advertisements, for example by choosing to scroll through or click on them. Additionally, in many environments, multiple learners learn concurrently about one or more related entities. This can bring about a range of interactions between individual learners. For example, multiple firms may compete or collaborate on performing market research. How do the learners and entities interact? How do these interactions change the task at hand? What are some desirable interactions in a learning environment? And what are the mechanisms for bringing about such desirable interactions? These are some of the questions we would like to explore more in this workshop.

Traditionally, learning theory has adopted two extreme views in this respect: First, when learning occurs in isolation from strategic behavior, such as in the classical PAC setting where the data is drawn from a fixed distribution; second, when the learner faces an adversary whose goal is to inhibit the learning process, such as the minimax setting where the data is generated by an adaptive worst-case adversary. While these extreme perspectives have lead to elegant results and concepts, such as VC dimension, Littlestone dimension, regret bounds, and more, many types of problems that we would like to solve involve strategic behaviors that do not fall into these two extremes. Examples of these problems include but are not limited to

1. Learning from data that is produced by agents who have vested interest in the outcome or the learning process. For example, learning a measure of quality of universities by surveying members of the academia who stand to gain or lose from the outcome, or when a GPS routing app has to learn patterns of traffic delay by routing individuals who have no interest in taking slower routes.

2. Learning a model for the strategic behavior of one or more agents by observing their interactions, for example, learning economical demands of buyers by observing their bidding patterns when competing with other buyers.

3. Learning as a model of interactions between agents. Examples of this include applications to swarm robotics, where individual agents have to learn to interact in a multi-agent setting in order to achieve individual or collective goals.

4. Interactions between multiple learners. In many settings, two or more learners learn about the same or multiple related concepts. How do these learners interact? What are the scenarios under which they would share knowledge, information, or data. What are the desirable interactions between learners? As an example, consider multiple competing pharmaceutical firms that are learning about the effectiveness of a certain treatment. In this case, while competing firms would prefer not to share their findings, it is beneficial to the society when such findings are shared. How can we incentivize these learners to perform such desirable interactions?

The main goal of this workshop is to address current challenges and opportunities that arise from the presence of strategic behavior in learning theory. This workshop aims at bringing together members of different communities, including machine learning, economics, theoretical computer science, and social computing, to share recent results, discuss important directions for future research, and foster collaborations.

Fri 9:00 a.m. - 9:45 a.m.

We live in a world where activities and interactions are recorded as data: food consumption, workout activities, buying and selling products, sharing information and experiences, borrowing and lending money, and exchanging excess resources. Scientists use the rich data of these activities to understand human social behavior, generate accurate predictions, and make policy recommendations. Machine learning traditionally take such data as given, often treating them as independent samples from some unknown statistical distribution. However, such data are possessed or generated by potentially strategic people in the context of specific interaction rules. Hence, what data become available depends on the interaction rules. For example, people with sensitive medical conditions may not reveal their medical data in a survey but could be willing to share them when compensated; crowd workers may not put in a good-faith effort in completing a task if they know that the requester cannot verify the quality of their contributions. In this talk, I argue that a holistic view that jointly considers data acquisition and learning is important. I will discuss two projects. The first project considers acquiring data from strategic data holders who have private cost for revealing their data and then learning from the acquired data. We provide a risk bound on learning, analogous to classic risk bounds, for situations when agents’ private costs can correlate with their data in arbitrary ways. The second project leverages techniques in learning to design a mechanism for obtaining high-quality data from strategic data holders. The mechanism has a strong incentive property: it is a dominant strategy for each agent to truthfully reveal their data even if we have no ground truth to directly evaluate their contributions.

This talk is based on joint works with Jacob Abernethy, Chien-Ju Ho, Yang Liu, and Bo Waggoner.

Yiling Chen
Fri 9:45 a.m. - 10:00 a.m.

We study an online linear classification problem, in which the data is generated by strategic agents who manipulate their features in an effort to change the classification outcome. In rounds, the learner deploys a classifier, and an adversarially chosen agent arrives, possibly manipulating her features to optimally respond to the learner. The learner has no knowledge of the agents' utility functions or real'' features, which may vary widely across agents. Instead, the learner is only able to observe theirrevealed preferences'' --- i.e. the actual manipulated feature vectors they provide. For a broad family of agent cost functions, we give a computationally efficient learning algorithm that is able to obtain diminishing ``Stackelberg regret'' --- a form of policy regret that guarantees that the learner is obtaining loss nearly as small as that of the best classifier in hindsight, even allowing for the fact that agents will best-respond differently to the optimal classifier.

Jinshuo Dong, Aaron Roth, Zachary Schutzman, Bo Waggoner and Zhiwei Steven Wu

Jinshuo Dong
Fri 10:00 a.m. - 10:15 a.m.

In online advertising markets, advertisers often purchase ad placements through bidding in repeated auctions based on realized viewer information. We study how budget-constrained advertisers may compete in such sequential auctions in the presence of uncertainty about future bidding opportunities and competition. We formulate this problem as a sequential game of incomplete information, where bidders know neither their own valuation distribution, nor the budgets and valuation distributions of their competitors. We introduce a family of practical bidding strategies we refer to as adaptive pacing strategies, in which advertisers adjust their bids according to the sample path of expenditures they exhibit. Under arbitrary competitors’ bids, we establish through matching lower and upper bounds the asymptotic optimality of this class of strategies as the number of auctions grows large. When all the bidders adopt these strategies, we establish the convergence of the induced dynamics and characterize a regime (well motivated in the context of display advertising markets) under which these strategies constitute an approximate Nash equilibrium in dynamic strategies: The benefit of unilaterally deviating to other strategies, including ones with access to complete information, becomes negligible as the number of auctions and competitors grows large. This establishes a connection between regret minimization and market stability, by which advertisers can essentially follow equilibrium bidding strategies that also ensure the best performance that can be guaranteed off-equilibrium.

Yonatan Gur and Santiago Balseiro.

Yonatan Gur
Fri 10:15 a.m. - 10:30 a.m.
  1. Strategyproof Linear Regression. Yiling Chen, Chara Podimata and Nisarg Shah

  2. Incentive-Aware Learning for Large Markets. Mohammad Mahdian, Vahab Mirrokni and Song Zuo.

  3. Learning to Bid Without Knowing Your Value. Zhe Feng, Chara Podimata and Vasilis Syrgkanis.

  4. Budget Management Strategies in Repeated Auctions. Uniqueness and Stability of System Equilibria. Santiago Balseiro, Anthony Kim, Mohammad Mahdian and Vahab Mirrokni.

  5. Dynamic Second Price Auctions with Low Regret. Vahab Mirrokni, Renato Paes Leme, Rita Ren and Song Zuo.

Chara Podimata, Song Zuo, Zhe Feng, Anthony Kim
Fri 11:00 a.m. - 11:45 a.m.

Learning has been adopted as a general behavioral model for players in repeated games. Learning offers a way that players can adopt to (possibly changing) environment. Learning guarantees high social welfare in many games (including traffic routing as well as online auctions), even when the game or the population of players is dynamically changing. The rate at which the game can change depends on the speed of convergence of the learning algorithm. If players observe all other participants, which such full information feedback classical learning algorithms offer very fast convergence. However, such full information feedback is often not available, and the convergence of classical algorithms with partial feedback is much good. In this talk we develop a black-box approach for learning where the learner observes as feedback only losses of a subset of the actions. The simplicity and black box nature of the approach allows us to use of this faster learning rate as a behavioral assumption in games. Talk based on joint work with Thodoris Lykouris and Karthik Sridharan.

Eva Tardos
Fri 11:45 a.m. - 12:30 p.m.

This talk presents an overview of several recent algorithms for regret minimization against strategic buyers in the context of posted-price auctions, which are crucial for revenue optimization in online advertisement.

Joint work with Andres Munoz Medina.

Mehryar Mohri
Fri 12:30 p.m. - 1:50 p.m.
Lunch Break (Break)
Fri 1:50 p.m. - 2:35 p.m.

We argue that the standard machine learning paradigm is both too weak and too string. First, we show that current systems for image classification and reading comprehension are vulnerable to adversarial attacks, suggesting that existing learning setups are inadequate to produce systems with robust behavior. Second, we show that in an interactive learning setting where incentives are aligned, a system can learn a simple natural language from a user from scratch, suggesting that much more can be learned under a cooperative setting.

Percy Liang
Fri 2:35 p.m. - 3:00 p.m.
  1. Multi-Agent Counterfactual Regret Minimization for Partial-Information Collaborative Games. Matthew Hartley, Stephan Zheng and Yisong Yue.

  2. Inference of Strategic Behavior based on Incomplete Observation Data. Antti Kangasrääsiö and Samuel Kaski.

  3. Inferring agent objectives at different scales of a complex adaptive system. Dieter Hendricks, Adam Cobb, Richard Everett, Jonathan Downing and Stephen Roberts.

  4. A Reinforcement Learning Framework for Eliciting High Quality Information. Zehong Hu, Yang Liu, Yitao Liang and Jie Zhang.

  5. Learning Multi-item Auctions with (or without) Samples. Yang Cai and Constantinos Daskalakis.

  6. Competing Bandits: Learning under Competition. Yishay Mansour, Aleksandrs Slivkins and Zhiwei Steven Wu.

  7. Robust commitments and partial reputation. Vidya Muthukumar and Anant Sahai

  8. Learning with Abandonment. Ramesh Johari and Sven Schmit.

Antti Kangasrääsiö, Richard Everett, Yitao Liang, Yang Cai, Steven Wu, Vidya Muthukumar, Sven Schmit
Fri 3:00 p.m. - 3:30 p.m.
Poster Session & Coffee break (Break)
Fri 3:30 p.m. - 4:15 p.m.

Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Importantly, social dilemmas are ubiquitous in real world interactions. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, begin by cooperating, try to avoid being exploited, and forgiving (try to return to mutual cooperation). Such agents can maintain cooperation in Markov social dilemmas with both perfect and imperfect information. Our construction does not require training methods beyond a modification of self-play, thus if an environment is such that good strategies can be constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.

Alexander Peysakhovich
Fri 4:15 p.m. - 4:30 p.m.

Consider a buyer participating in a repeated auction in an ad exchange. How does a buyer figure out whether her bids will be used against her in the form of reserve prices? There are many potential A/B testing setups that one can use. However, we will show many natural experimental designs have serious flaws.

For instance, one can use additive or multiplicative perturbation to the bids. We show that additive perturbations to bids can lead to paradoxical results, as reserve prices are not guaranteed to be monotone for non-MHR distributions, and thus higher bids may lead to lower reserve prices!

Similarly, one may be tempted to measure bid influence in reserves by randomly perturbing one's bids. However, unless the perturbations are aligned with the partitions used by the seller to compute optimal reserve prices, the results are guaranteed to be inconclusive.

Finally, in practice additional market considerations play a large role---if the optimal reserve price is further constrained by the seller to satisfy additional business logic, the power of the buyer to detect the effect to which his bids are being used against him is limited.

In this work we develop tests that a buyer can user to measure the impact of current bids on future reserve prices. In addition, we analyze the cost of running such experiments, exposing trade-offs between test accuracy, cost, and underlying market dynamics. We validate our results with experiments on real world data and show that a buyer can detect reserve price optimization done by the seller at a reasonable cost.

Andres Munoz Medina, Sebastien Lahaie, Sergei Vassilvitskii and Balasubramanian Sivan

Andres Munoz
Fri 4:30 p.m. - 4:45 p.m.

Designing an auction that maximizes expected revenue is an intricate task. Despite major efforts, only the single-item case is fully understood. We explore the use of tools from deep learning on this topic. The design objective is revenue optimal, dominant-strategy incentive compatible auctions. For a baseline, we show that multi-layer neural networks can learn almost-optimal auctions for a variety of settings for which there are analytical solutions, and even without encoding characterization results into the design of the network. Looking ahead, deep learning has promise for deriving auctions with high revenue for poorly understood problems.

Paul Duetting, Zhe Feng, Harikrishna Narasimhan, and David Parkes

David Parkes
Fri 4:45 p.m. - 5:00 p.m.

Humans, like all animals, both cooperate and compete with each other. Through these interactions we learn to observe, act, and manipulate to maximize our utility function, and continue doing so as others learn with us. This is a decentralized non-stationary learning problem, where to survive and flourish an agent must adapt to the gradual changes of other agents as they learn, as well as capitalize on sudden shifts in their behavior. To date, a majority of the work in deep multi-agent reinforcement learning has focused on only one of these types of adaptations. In this paper, we introduce the Switching Agent Model (SAM) as a way of dealing with both types of non-stationarity through the combination of opponent modeling and deep multi-agent reinforcement learning.

Richard Everett

Richard Everett

Author Information

Nika Haghtalab (Carnegie Mellon University)
Yishay Mansour (Tel Aviv University)
Tim Roughgarden (Stanford University)
Vasilis Syrgkanis (Microsoft Research)
Jenn Wortman Vaughan (Microsoft Research)

Jenn Wortman Vaughan is a Senior Principal Researcher at Microsoft Research, New York City. Her research background is in machine learning and algorithmic economics. She is especially interested in the interaction between people and AI, and has often studied this interaction in the context of prediction markets and other crowdsourcing systems. In recent years, she has turned her attention to human-centered approaches to transparency, interpretability, and fairness in machine learning as part of MSR's FATE group and co-chair of Microsoft’s Aether Working Group on Transparency. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn's 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and a handful of best paper awards. In her "spare" time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which has been held each year since 2006.

More from the Same Authors