Offline reinforcement learning methods have been used to learn policies from observational data for recommending treatment of chronic diseases and interventions in critical care. In these formulations, treatments can be recommended for each patient individually without regard to treatment availability because resources are plentiful and patients are independent of one another. However, in many decision making problems, such as recommending care in resource poor settings, the space of available actions is constrained and the policy must take these constraints into account. We consider the problem of learning policies for personalized treatment when there are limited resources and actions taken for one patient affect the actions available for other patients.
One such sequential decision making problem is hospital bed assignment. Hospitals are complex systems, in which not only the medical care, but also the physical hospital environment aﬀect patients’ outcomes. For CDI, one of the most common healthcare acquired infections, the history of a patient’s bed and room can contribute to their risk of infection because c. diff. spores can linger on surfaces. We consider the problem of assigning patients to hospital beds with the objective of reducing the incidence of Clostridioides diﬃcile infection (CDI) while taking into account the limited availability of beds. Our algorithm first learns a Q-function for assigning beds to an individual patient ignoring bed availability. We use this Q-function to assign patients to beds in order of their risk level, taking the highest value action among those available for each patient. We test our algorithm on simulated data as well as a real dataset of hospitalizations from a large urban hospital.