Test Time Risk Adaption with Mixture of Agents
Abstract
In real-world reinforcement learning (RL) applications, agents often encounter unforeseen risks during deployment, necessitating robust decision-making without the luxury of further fine-tuning. While recent risk-aware RL methods incorporate return variance as a surrogate for safety, this captures only a narrow subset of real-world risks. Addressing this gap, we introduce TRAM, Test-time Risk Alignment with a Mixture of agents, a novel framework designed to enhance risk-aware decision-making during inference. TRAM operates by optimizing a weighted combination of predicted returns and a risk metric derived from state-action occupancy measures, enabling the agent to adaptively balance performance and safety in real time. Our approach allows for a nuanced representation of diverse risk factors without necessitating additional training, which does not exist in the literature. We provide theoretical sub-optimality bounds to substantiate the efficacy of our method. Empirical evaluations demonstrate that TRAM consistently outperforms existing baselines, delivering safer policies across varying risk conditions in test environments.