Timezone: »
The softmax is the standard transformation used in machine learning to map real-valued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization (softmax gravity well''), and (2) optimizing log-probabilities under the softmax must exhibit slow convergence (
softmax damping''). Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the \emph{escort} mapping, that demonstrates better optimization properties. The disadvantages of the softmax and the effectiveness of the escort transformation are further explained using the concept of N\L{} coefficient. In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation.
Author Information
Jincheng Mei (University of Alberta / Google Brain)
Chenjun Xiao (University of Alberta)
Bo Dai (Google Brain)
Lihong Li (Amazon)
Csaba Szepesvari (DeepMind / University of Alberta)
Dale Schuurmans (Google Brain & University of Alberta)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Oral: Escaping the Gravitational Pull of Softmax »
Tue Dec 8th 02:15 -- 02:30 PM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2020 Poster: Off-Policy Imitation Learning from Observations »
Zhuangdi Zhu · Kaixiang Lin · Bo Dai · Jiayu Zhou -
2020 Poster: Model Selection in Contextual Stochastic Bandit Problems »
Aldo Pacchiano · My Phan · Yasin Abbasi Yadkori · Anup Rao · Julian Zimmert · Tor Lattimore · Csaba Szepesvari -
2020 Poster: Differentiable Top-k with Optimal Transport »
Yujia Xie · Hanjun Dai · Minshuo Chen · Bo Dai · Tuo Zhao · Hongyuan Zha · Wei Wei · Tomas Pfister -
2020 Poster: ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool »
Gellert Weisz · András György · Wei-I Lin · Devon Graham · Kevin Leyton-Brown · Csaba Szepesvari · Brendan Lucier -
2020 Poster: Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration »
Hanjun Dai · Rishabh Singh · Bo Dai · Charles Sutton · Dale Schuurmans -
2020 Poster: Differentiable Meta-Learning of Bandit Policies »
Craig Boutilier · Chih-wei Hsu · Branislav Kveton · Martin Mladenov · Csaba Szepesvari · Manzil Zaheer -
2020 Poster: PAC-Bayes Analysis Beyond the Usual Bounds »
Omar Rivasplata · Ilja Kuzborskij · Csaba Szepesvari · John Shawe-Taylor -
2020 Poster: A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs »
Nevena Lazic · Dong Yin · Mehrdad Farajtabar · Nir Levine · Dilan Gorur · Chris Harris · Dale Schuurmans -
2020 Poster: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: Online Algorithm for Unsupervised Sequential Selection with Contextual Information »
Arun Verma · Manjesh Kumar Hanawal · Csaba Szepesvari · Venkatesh Saligrama -
2020 Poster: Efficient Planning in Large MDPs with Weak Linear Function Approximation »
Roshan Shariff · Csaba Szepesvari -
2020 Spotlight: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Off-Policy Evaluation via the Regularized Lagrangian »
Mengjiao Yang · Ofir Nachum · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Spotlight: CoinDICE: Off-Policy Confidence Interval Estimation »
Bo Dai · Ofir Nachum · Yinlam Chow · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2019 Workshop: The Optimization Foundations of Reinforcement Learning »
Bo Dai · Niao He · Nicolas Le Roux · Lihong Li · Dale Schuurmans · Martha White -
2019 Poster: Maximum Entropy Monte-Carlo Planning »
Chenjun Xiao · Ruitong Huang · Jincheng Mei · Dale Schuurmans · Martin Müller -
2019 Poster: Meta Architecture Search »
Albert Shaw · Wei Wei · Weiyang Liu · Le Song · Bo Dai -
2019 Poster: Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging »
Pooria Joulani · András György · Csaba Szepesvari -
2019 Poster: A Kernel Loss for Solving the Bellman Equation »
Yihao Feng · Lihong Li · Qiang Liu -
2019 Poster: Detecting Overfitting via Adversarial Examples »
Roman Werpachowski · András György · Csaba Szepesvari -
2019 Poster: Exponential Family Estimation via Adversarial Dynamics Embedding »
Bo Dai · Zhen Liu · Hanjun Dai · Niao He · Arthur Gretton · Le Song · Dale Schuurmans -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Energy-Inspired Models: Learning with Sampler-Induced Distributions »
John Lawson · George Tucker · Bo Dai · Rajesh Ranganath -
2019 Poster: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Spotlight: DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections »
Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li -
2019 Poster: Retrosynthesis Prediction with Conditional Graph Logic Network »
Hanjun Dai · Chengtao Li · Connor Coley · Bo Dai · Le Song -
2018 Poster: TopRank: A practical algorithm for online stochastic ranking »
Tor Lattimore · Branislav Kveton · Shuai Li · Csaba Szepesvari -
2018 Poster: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Oral: Non-delusional Q-learning and value-iteration »
Tyler Lu · Dale Schuurmans · Craig Boutilier -
2018 Poster: Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification »
Harsh Shrivastava · Eugene Bart · Bob Price · Hanjun Dai · Bo Dai · Srinivas Aluru -
2018 Poster: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation »
Qiang Liu · Lihong Li · Ziyang Tang · Denny Zhou -
2018 Poster: Coupled Variational Bayes via Optimization Embedding »
Bo Dai · Hanjun Dai · Niao He · Weiyang Liu · Zhen Liu · Jianshu Chen · Lin Xiao · Le Song -
2018 Spotlight: Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation »
Qiang Liu · Lihong Li · Ziyang Tang · Denny Zhou -
2018 Poster: PAC-Bayes bounds for stable algorithms with instance-dependent priors »
Omar Rivasplata · Emilio Parrado-Hernandez · John Shawe-Taylor · Shiliang Sun · Csaba Szepesvari -
2018 Poster: Predictive Approximate Bayesian Computation via Saddle Points »
Yingxiang Yang · Bo Dai · Negar Kiyavash · Niao He -
2018 Poster: Adversarial Attacks on Stochastic Bandits »
Kwang-Sung Jun · Lihong Li · Yuzhe Ma · Jerry Zhu -
2018 Poster: Learning towards Minimum Hyperspherical Energy »
Weiyang Liu · Rongmei Lin · Zhen Liu · Lixin Liu · Zhiding Yu · Bo Dai · Le Song -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2017 Poster: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Spotlight: Deep Hyperspherical Learning »
Weiyang Liu · Yan-Ming Zhang · Xingguo Li · Zhiding Yu · Bo Dai · Tuo Zhao · Le Song -
2017 Poster: Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes »
Jianshu Chen · Chong Wang · Lin Xiao · Ji He · Lihong Li · Li Deng -
2016 Poster: Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities »
Ruitong Huang · Tor Lattimore · András György · Csaba Szepesvari -
2016 Poster: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2016 Oral: SDP Relaxation with Randomized Rounding for Energy Disaggregation »
Kiarash Shaloudegi · András György · Csaba Szepesvari · Wilsun Xu -
2016 Poster: Active Learning with Oracle Epiphany »
Tzu-Kuo Huang · Lihong Li · Ara Vartanian · Saleema Amershi · Jerry Zhu -
2015 Poster: Online Learning with Gaussian Payoffs and Side Observations »
Yifan Wu · András György · Csaba Szepesvari -
2015 Poster: Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path »
Daniel Hsu · Aryeh Kontorovich · Csaba Szepesvari -
2015 Poster: Linear Multi-Resource Allocation with Semi-Bandit Feedback »
Tor Lattimore · Yacov Crammer · Csaba Szepesvari -
2015 Poster: Combinatorial Cascading Bandits »
Branislav Kveton · Zheng Wen · Azin Ashkan · Csaba Szepesvari -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2014 Poster: Universal Option Models »
hengshuai yao · Csaba Szepesvari · Richard Sutton · Joseph Modayil · Shalabh Bhatnagar -
2014 Poster: Scalable Kernel Methods via Doubly Stochastic Gradients »
Bo Dai · Bo Xie · Niao He · Yingyu Liang · Anant Raj · Maria-Florina F Balcan · Le Song -
2013 Poster: Online Learning with Costly Features and Labels »
Navid Zolghadr · Gábor Bartók · Russell Greiner · András György · Csaba Szepesvari -
2013 Poster: Robust Low Rank Kernel Embeddings of Multivariate Distributions »
Le Song · Bo Dai -
2013 Poster: Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions »
Yasin Abbasi Yadkori · Peter Bartlett · Varun Kanade · Yevgeny Seldin · Csaba Szepesvari -
2012 Session: Oral Session 6 »
Csaba Szepesvari -
2012 Poster: Deep Representations and Codes for Image Auto-Annotation »
Jamie Kiros · Csaba Szepesvari -
2011 Poster: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Spotlight: Improved Algorithms for Linear Stochastic Bandits »
Yasin Abbasi Yadkori · David Pal · Csaba Szepesvari -
2011 Poster: An Empirical Evaluation of Thompson Sampling »
Olivier Chapelle · Lihong Li -
2010 Spotlight: Learning from Logged Implicit Exploration Data »
Alex Strehl · Lihong Li · John Langford · Sham M Kakade -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Learning from Logged Implicit Exploration Data »
Alexander L Strehl · John Langford · Lihong Li · Sham M Kakade -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos -
2010 Poster: Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs »
David Pal · Barnabas Poczos · Csaba Szepesvari -
2010 Poster: Parametric Bandits: The Generalized Linear Case »
Sarah Filippi · Olivier Cappé · Aurélien Garivier · Csaba Szepesvari -
2010 Poster: Error Propagation for Approximate Policy and Value Iteration »
Amir-massoud Farahmand · Remi Munos · Csaba Szepesvari -
2010 Poster: Parallelized Stochastic Gradient Descent »
Martin A Zinkevich · Markus Weimer · Alexander Smola · Lihong Li -
2009 Poster: Multi-Step Dyna Planning for Policy Evaluation and Control »
Hengshuai Yao · Richard Sutton · Shalabh Bhatnagar · Dongcui Diao · Csaba Szepesvari -
2009 Poster: A General Projection Property for Distribution Families »
Yao-Liang Yu · Yuxi Li · Dale Schuurmans · Csaba Szepesvari -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Poster: Online Optimization in X-Armed Bandits »
Sebastien Bubeck · Remi Munos · Gilles Stoltz · Csaba Szepesvari -
2008 Poster: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang -
2008 Poster: Regularized Policy Iteration »
Amir-massoud Farahmand · Mohammad Ghavamzadeh · Csaba Szepesvari · Shie Mannor -
2008 Spotlight: Sparse Online Learning via Truncated Gradient »
John Langford · Lihong Li · Tong Zhang -
2008 Poster: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approxi »
Richard Sutton · Csaba Szepesvari · Hamid R Maei -
2007 Poster: Fitted Q-iteration in continuous action-space MDPs »
Remi Munos · András Antos · Csaba Szepesvari