Timezone: »

Safety and Robustness in Decision-making
Mohammad Ghavamzadeh · Shie Mannor · Yisong Yue · Marek Petrik · Yinlam Chow

Fri Dec 13 08:00 AM -- 06:40 PM (PST) @ East Ballroom A
Event URL: https://sites.google.com/view/neurips19-safe-robust-workshop »

Interacting with increasingly sophisticated decision-making systems is becoming more and more a part of our daily life. This creates an immense responsibility for designers of these systems to build them in a way to guarantee safe interaction with their users and good performance, in the presence of noise and changes in the environment, and/or of model misspecification and uncertainty. Any progress in this area will be a huge step forward in using decision-making algorithms in emerging high stakes applications, such as autonomous driving, robotics, power systems, health care, recommendation systems, and finance.

This workshop aims to bring together researchers from academia and industry in order to discuss main challenges, describe recent advances, and highlight future research directions pertaining to develop safe and robust decision-making systems. We aim to highlight new and emerging theoretical and applied research opportunities for the community that arise from the evolving needs for decision-making systems and algorithms that guarantee safe interaction and good performance under a wide range of uncertainties in the environment.

Fri 8:00 a.m. - 8:15 a.m.
Opening Remarks (Opening Presentation)
Fri 8:15 a.m. - 8:55 a.m.

How can we build autonomous robots that operate in unstructured and dynamic environments such as homes or hospitals? This problem has been investigated under several disciplines, including planning (motion planning, task planning, etc.), and reinforcement learning. While both of these fields have witnessed tremendous progress, each have fundamental drawbacks: planning approaches require substantial manual engineering in mapping perception to a formal planning problem, while RL, which can operate directly on raw percepts, is data hungry, cannot generalize to new tasks, and is ‘black box’ in nature.

Motivated by humans’ remarkable capability to imagine and plan complex manipulations of objects, and recent advances in imagining images such as GANs, we present Visual Plan Imagination (VPI) — a new computational problem that combines image imagination and planning. In VPI, given off-policy image data from a dynamical system, the task is to ‘imagine’ image sequences that transition the system from start to goal. Thus, VPI focuses on the essence of planning with high-dim perception, and abstracts away low level control and reward engineering. More importantly, VPI provides a safe and interpretable basis for robotic control — before the robot acts, a human inspects the imagined plan the robot will act upon, and can intervene if necessary.

I will describe our approach to VPI based on Causal InfoGAN, a deep generative model that learns features that are compatible with strong planning algorithms. We show that Causal InfoGAN can generate convincing visual plans, and we demonstrate learning to imagine and execute real robot rope manipulation from image data. I will also discuss our VPI simulation benchmarks, and recent efforts in novelty detection, an important component in VPI, and in safe decision making in general.

Aviv Tamar
Fri 8:55 a.m. - 9:35 a.m.

We study stochastic optimization problems where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent training samples. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of the estimated cost function that serves as a near-optimal candidate decision, i.e., a prescriptor. As functions of the data, predictors and prescriptors constitute statistical estimators. We propose a meta-optimization problem to find the least conservative predictors and prescriptors subject to constraints on their out-of-sample disappointment. The out-of-sample disappointment quantifies the probability that the actual expected cost of the candidate decision under the unknown true distribution exceeds its predicted cost. Leveraging tools from large deviations theory, we prove that this meta-optimization problem admits a unique solution: The best predictor-prescriptor pair is obtained by solving a distributionally robust optimization problem over all distributions within a given relative entropy distance from the empirical distribution of the data.

Daniel Kuhn
Fri 9:35 a.m. - 10:30 a.m.
Poster Session (Posters)
Ahana Ghosh, Javad Shafiee, Akhilan Boopathy, Alex Tamkin, Theodore Vasiloudis, Vedant Nanda, Ali Baheri, Paul Fieguth, Andrew Bennett, Guanya Shi, Hao Liu, Arushi Jain, Jacob Tyo, Benjie Wang, Boxiao Chen, Carroll Wainwright, Chandramouli Shama Sastry, Chao Tang, Daniel S. Brown, David Inouye, David Venuto, Dhruv Ramani, Dimitrios Diochnos, Divyam Madaan, Dmitrii Krashenikov, Joel Oren, Doyup Lee, Eleanor Quint, elmira amirloo, Matteo Pirotta, Gavin Hartnett, Geoffroy Dubourg-Felonneau, Gokul Swamy, Pin-Yu Chen, Ilija Bogunovic, Jason Carter, Javier Garcia-Barcos, Jeet Mohapatra, Jesse Zhang, Jian Qian, John Martin, Oliver Richter, Federico Zaiter, Tsui-Wei Weng, Karthik Abinav Sankararaman, Kyriakos Polymenakos, NGOC Lan Hoang, mahdieh abbasi, Marco Gallieri, Mathieu Seurin, Matteo Papini, Matteo Turchetta, Matthew Sotoudeh, Mehrdad Hosseinzadeh, Nathan Fulton, Masatoshi Uehara, Niranjani Prasad, Oana-Maria Camburu, Patrik Kolaric, Philipp Renz, Prateek Jaiswal, Reaz Russel, Riashat Islam, Rishabh Agarwal, Alex Aldrick, Sachin Vernekar, Sahin Lale, Sai Kiran Narayanaswami, Samuel Daulton, Sanjam Garg, Sebastian East, Shun Zhang, Soheil Dsidbari, Justin Goodwin, Victoria Krakovna, Wenhao Luo, Wesley Chung, Yuanyuan Shi, Yuh-Shyang Wang, Hongwei Jin, Ziping Xu
Fri 10:30 a.m. - 11:10 a.m.

Statistical methods for off-policy evaluation and counterfactual reasoning will have fundamental limitations based on what assumptions can be made and what kind of exploration is present in the data (some of which is being presented here by other speakers!). In this talk, I'll discuss some recent directions in our lab regarding ways to integrate human experts into the process of policy evaluation and selection in batch settings. The first deals with statistical limitations by seeking a diverse collection of statistically-indistinguishable (with respect to outcome) policies for humans to eventually decide from. The second involves directly integrating human feedback to eliminate or validate specific sources of sensitivity in an off-policy evaluation to get more robust estimates (or at least better understand the source of their non-robustness). More broadly, I will discuss open directions for moving from purely-statistical (e.g. off-policy evaluation) or purely-human (e.g. interpretability-based) approaches for robust/safe decision-making toward combining the advantages of both.

Finale Doshi-Velez
Fri 11:10 a.m. - 11:50 a.m.

In this talk I will present a decision-making and control stack for human-robot interactions by using autonomous driving as a motivating example. Specifically, I will first discuss a data-driven approach for learning multimodal interaction dynamics between robot-driven and human-driven vehicles based on recent advances in deep generative modeling. Then, I will discuss how to incorporate such a learned interaction model into a real-time, interaction-aware decision-making framework. The framework is designed to be minimally interventional; in particular, by leveraging backward reachability analysis, it ensures safety even when other cars defy the robot's expectations without unduly sacrificing performance. I will present recent results from experiments on a full-scale steer-by-wire platform, validating the framework and providing practical insights. I will conclude the talk by providing an overview of related efforts from my group on infusing safety assurances in robot autonomy stacks equipped with learning-based components, with an emphasis on adding structure within robot learning via control-theoretical and formal methods

Marco Pavone
Fri 11:50 a.m. - 12:30 p.m.

The presentation deals with some practical facets of application of AI methods to designing driving policy for autonomous vehicles. Relationship between the reinforcement learning (RL) based solutions and the use of rule-based and model-based techniques for improving their robustness and safety are discussed. One approach to obtaining explainable RL models by learning alternative rule-based representations is proposed. The presentation also elaborates on the opportunities for extending the AI driving policy approaches by applying game theory inspired methodology to addressing diverse and unforeseen scenarios, and representing the negotiation aspects of decision making in autonomous driving.

Fri 12:30 p.m. - 2:00 p.m.
Lunch Break (Lunch)
Fri 2:00 p.m. - 2:40 p.m.

Off-policy evaluation (OPE) is crucial for reinforcement learning in domains like medicine with limited exploration, but OPE is also notoriously difficult because the similarity between trajectories generated by any proposed policy and the observed data diminishes exponentially as horizons grow, known as the curse of horizon. To understand precisely when this curse bites, we consider for the first time the semi-parametric efficiency limits of OPE in Markov decision processes (MDP), establishing the best-possible estimation errors and characterizing the curse as a problem-dependent phenomenon rather than method-dependent. Efficiency in OPE is crucial because, without exploration, we must use the available data to its fullest. In finite horizons, this shows standard doubly-robust (DR) estimators are in fact inefficient for MDPs. In infinite horizons, while the curse renders certain problems fundamentally intractable, OPE may be feasible in ergodic time-invariant MDPs. We develop the first OPE estimator that achieves the efficiency limits in both setting, termed Double Reinforcement Learning (DRL). In both finite and infinite horizons, DRL improves upon existing estimators, which we show are inefficient, and leverages problem structure to its fullest in the face of the curse of horizon. We establish many favorable characteristics for DRL including efficiency even when nuisances are estimated slowly by blackbox models, finite-sample guarantees, and model double robustness.

Nathan Kallus
Fri 2:40 p.m. - 3:20 p.m.

In recent years, high-confidence reinforcement learning algorithms have enjoyed success in application areas with high-quality models and plentiful data, but robotics remains a challenging domain for scaling up such approaches. Furthermore, very little work has been done on the even more difficult problem of safe imitation learning, in which the demonstrator's reward function is not known. This talk focuses on three recent developments in this emerging area of research: (1) a theory of safe imitation learning; (2) scalable reward inference in the absence of models; (3) efficient off-policy policy evaluation. The proposed algorithms offer a blend of safety and practicality, making a significant step towards safe robot learning with modest amounts of real-world data.

Scott Niekum
Fri 3:20 p.m. - 4:30 p.m.
Poster Session and Coffee Break (Break)
Fri 4:30 p.m. - 5:10 p.m.

In this talk, we will review some recent advances in the area of multistage decision making under uncertainty, especially in the domain of stochastic and robust optimization. We will present some new algorithmic development that allows for exactly solving huge-scale stochastic programs with integer recourse decisions, and algorithms in a dual perspective that can deal with infeasibility in problems. This significantly extends the scope of stochastic dual dynamic programming (SDDP) algorithms from convex or binary state variable cases to general nonconvex problems. We will also present a new analysis of the iteration complexity of the proposed algorithms. This settles some open questions in regards of the complexity of SDDP.

Andy Sun
Fri 5:10 p.m. - 5:50 p.m.

Search engines and recommender systems have become the dominant matchmaker for a wide range of human endeavors -- from online retail to finding romantic partners. Consequently, they carry immense power in shaping markets and allocating opportunity to the participants. In this talk, I will discuss how the machine learning algorithms underlying these system can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons often manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and lead to rich-get-richer dynamics. But even when trained with unbiased data, reasons endogenous to the algorithms can lead to unfair allocation of opportunity. To overcome these challenges, I will present new machine learning algorithms that directly address both endogenous and exogenous unfairness.

Thorsten Joachims
Fri 5:50 p.m. - 6:00 p.m.
Concluding Remarks (Remarks)

Author Information

Mohammad Ghavamzadeh (Facebook AI Research)
Shie Mannor (Technion)
Yisong Yue (Caltech)
Marek Petrik (University of New Hampshire)
Yinlam Chow (Google Research)

More from the Same Authors