Timezone: »

Differentiable MPC for End-to-end Planning and Control
Brandon Amos · Ivan Jimenez · Jacob I Sacks · Byron Boots · J. Zico Kolter

Thu Dec 06 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #163

We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.

Author Information

Brandon Amos (Carnegie Mellon University)
Ivan Jimenez (Georgia Tech)
Jacob I Sacks (Georgia Institute of Technology)

I am a PhD student in the Electrical and Computer Engineering department at Georgia Tech working in the Robot Learning Lab under Dr. Byron Boots. My primary research interests are in machine learning, optimal control theory, imitation/reinforcement learning, and connections between these disciplines. Additionally, I am interested in hardware acceleration for machine learning and robotics and neuro-inspired compute systems. Prior to joining Georgia Tech, I received my BS in Biomedical Engineering from UT Austin.

Byron Boots (Georgia Tech / Google Brain)
J. Zico Kolter (Carnegie Mellon University / Bosch Center for AI)

Zico Kolter is an Assistant Professor in the School of Computer Science at Carnegie Mellon University, and also serves as Chief Scientist of AI Research for the Bosch Center for Artificial Intelligence. His work focuses on the intersection of machine learning and optimization, with a large focus on developing more robust, explainable, and rigorous methods in deep learning. In addition, he has worked on a number of application areas, highlighted by work on sustainability and smart energy systems. He is the recipient of the DARPA Young Faculty Award, and best paper awards at KDD, IJCAI, and PESGM.

More from the Same Authors