Sepsis is a severe reaction by the human body to infection and is associated with significant morbidity and mortality. Advances in the scale and granularity of electronic health record data offer the opportunity to apply reinforcement learning to understand clinician diagnostic and treatment policies for this complex condition, which can be used to understand the factors that drive disparities in sepsis care. The fundamental problem in using RL to model sepsis is that the reward function is unknown and involves tradeoffs between competing outcomes. In this work, we develop an inverse reinforcement learning (IRL) model to learn a reward function for patients being treated for sepsis, then leverage offline RL to map state-action pairs from retrospective data, thereby learning the expert policy. We will apply this approach to two large and independent datasets: part of MIMIC-IV data with sepsis patients admitted to ICU and the clinical data warehouse of the Mass General Brigham healthcare system which has detailed data from arrival in the emergency room until hospital discharge across 12 hospitals in the New England area from 2015 through the present. With learned policy, we will identify whether policies differ by gender and race/ethnicity subgroups, and finally, we will attempt to identify changes in recorded physician policies before and after the introduction of the national treatment guidelines. We hope this approach could help us understand the differential treatment policy across the subgroups of sepsis patients.