From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment
Ella Lan · Andrea Yu · Sergio Charles
Abstract
Sepsis is a complex and life-threatening condition requiring individualized, time-sensitive interventions. Reinforcement learning (RL) has shown promise for optimizing sepsis care, but real-world adoption is hindered by the opacity of its decision-making processes. We propose a novel two-phase framework that couples deep Q-learning with post hoc interpretability via decision tree distillation. Phase I trains deep Q-networks (DQNs) on MIMIC-III ICU trajectories, exploring ensemble methods and behavior cloning (BC) regularization for improved robustness and clinician agreement. Phase II distills the learned policies into shallow, human-readable decision trees using greedy, probabilistic, and Q-regression approaches. Our results show increased clinician agreement from 0.231 (baseline) to 0.906 (BC-DQN), without degrading policy value, while our distilled trees retain near-perfect fidelity ($\geq 0.998$), improving transparency. This framework can help bridge the trust gap between ``black-box'' medical AI and interpretable clinical technologies.
Chat is not available.
Successful Page Load