Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Distribution Shifts: New Frontiers with Foundation Models

Transfer Learning, Reinforcement Learning for Adaptive Control Optimization under Distribution Shift

Pankaj Rajak · Wojciech Kowalinski · Fei Wang

Keywords: [ Fraud Prevention ] [ Reinforcement Learning ] [ transfer learning ]


Abstract:

Many control systems rely on a pipeline of machine learning models and hand-coded rules to make decisions. However, due to changes in the operating environment, these rules require constant tuning to maintain optimal system performance. Reinforcement learning (RL) can automate the online optimization of rules based on incoming data. However, RL requires extensive training data and exploration, which limits its application to new rules or those with sparse data. Here, we propose a transfer learning approach called Learning from Behavior Prior (LBP) to enable fast, sample-efficient RL optimization by transferring knowledge from an expert controller. We demonstrate this approach by optimizing the rule thresholds in a simulated control pipeline across differing operating conditions. Our method converges 5x faster than vanilla RL, with greater robustness to distribution shift between the expert and target environments. LBP reduces negative impacts during live training, enabling automated optimization even for new controllers.

Chat is not available.