Timezone: »
Off-policy deep reinforcement learning algorithms like Soft Actor Critic (SAC) have achieved state-of-the-art results in several high dimensional continuous control tasks. Despite their success, they are prone to instability due to the \textit{deadly triad} of off-policy training, function approximation, and bootstrapping. Unstable training of off-policy algorithms leads to sample inefficient and sub-optimal asymptotic performance, thus preventing their real-world deployment. To mitigate these issues, previously proposed solutions have focused on advances like target networks to alleviate instability and the introduction of twin critics to address overestimation bias. However, these modifications fail to address the issue of noisy gradient estimation with excessive variance, resulting in instability and slow convergence. Our proposed method, Spectral Normalized Actor Critic (SNAC), regularizes the actor and the critics using spectral normalization to systematically bound the gradient norm. Spectral normalization constrains the magnitudes of the gradients resulting in smoother actor-critics with robust and sample-efficient performance thus making them suitable for deployment in stability-critical and compute-constrained applications. We present empirical results on several challenging reinforcement learning benchmarks and extensive ablation studies to demonstrate the effectiveness of our proposed method.
Author Information
Payal Bawa (University of Sydney)
Rafael Oliveira (The University of Sydney)
Fabio Ramos (University of Sydney, NVIDIA)
More from the Same Authors
-
2022 : Learning Successor Feature Representations to Train Robust Policies for Multi-task Learning »
Melissa Mozifian · Dieter Fox · David Meger · Fabio Ramos · Animesh Garg -
2022 Workshop: 5th Robot Learning Workshop: Trustworthy Robotics »
Alex Bewley · Roberto Calandra · Anca Dragan · Igor Gilitschenski · Emily Hannigan · Masha Itkina · Hamidreza Kasaei · Jens Kober · Danica Kragic · Nathan Lambert · Julien PEREZ · Fabio Ramos · Ransalu Senanayake · Jonathan Tompson · Vincent Vanhoucke · Markus Wulfmeier -
2022 Spotlight: Batch Bayesian optimisation via density-ratio estimation with guarantees »
Rafael Oliveira · Louis Tiao · Fabio Ramos -
2022 Poster: Batch Bayesian optimisation via density-ratio estimation with guarantees »
Rafael Oliveira · Louis Tiao · Fabio Ramos -
2020 : Invited Talk - "RL with Sim2Real in the Loop / Online Domain Adaptation for Mapping" »
Fabio Ramos · Anthony Tompkins -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Bayesian optimization by density ratio estimation »
Louis Tiao · Aaron Klein · Cedric Archambeau · Edwin Bonilla · Matthias W Seeger · Fabio Ramos -
2020 Poster: Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning »
Anthony Tompkins · Rafael Oliveira · Fabio Ramos -
2019 : Poster Session »
Lili Yu · Aleksei Kroshnin · Alex Delalande · Andrew Carr · Anthony Tompkins · Aram-Alexandre Pooladian · Arnaud Robert · Ashok Vardhan Makkuva · Aude Genevay · Bangjie Liu · Bo Zeng · Charlie Frogner · Elsa Cazelles · Esteban G Tabak · Fabio Ramos · François-Pierre PATY · Georgios Balikas · Giulio Trigila · Hao Wang · Hinrich Mahler · Jared Nielsen · Karim Lounici · Kyle Swanson · Mukul Bhutani · Pierre Bréchet · Piotr Indyk · samuel cohen · Stefanie Jegelka · Tao Wu · Thibault Sejourne · Tudor Manole · Wenjun Zhao · Wenlin Wang · Wenqi Wang · Yonatan Dukler · Zihao Wang · Chaosheng Dong -
2018 : Fabio Ramos (Uni. of Sydney): Learning and Planning in Spatial-Temporal Data »
Fabio Ramos -
2018 Workshop: Modeling and decision-making in the spatiotemporal domain »
Ransalu Senanayake · Neal Jean · Fabio Ramos · Girish Chowdhary -
2018 Poster: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2018 Oral: Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models »
Amir Dezfouli · Richard Morris · Fabio Ramos · Peter Dayan · Bernard Balleine -
2016 Poster: Spatio-Temporal Hilbert Maps for Continuous Occupancy Representation in Dynamic Environments »
Ransalu Senanayake · Lionel Ott · Simon O'Callaghan · Fabio Ramos -
2014 Poster: On Integrated Clustering and Outlier Detection »
Lionel Ott · Linsey Pang · Fabio Ramos · Sanjay Chawla