Timezone: »

Machine Learning for Health
Uri Shalit · Marzyeh Ghassemi · Jason Fries · Rajesh Ranganath · Theofanis Karaletsos · David Kale · Peter Schulam · Madalina Fiterau

Thu Dec 08 11:00 PM -- 09:30 AM (PST) @ Room 116
Event URL: http://nipsml4hc.ws/ »

The last decade has seen unprecedented growth in the availability and size of digital health data, including electronic health records, genetics, and wearable sensors. These rich data sources present opportunities to develop and apply machine learning methods to enable precision medicine. The aim of this workshop is to engender discussion between machine learning and clinical researchers about how statistical learning can enhance both the science and the practice of medicine.

Of particular interest to this year’s workshop is a phrase recently coined by the British Medical Journal, "Big Health Data", where the focus is on modeling and improving health outcomes across large numbers of patients with diverse genetic, phenotypic, and environmental characteristics. The majority of clinical informatics research has focused on narrow populations representing, for example, patients from a single institution or sharing a common disease, and on modeling clinical factors, such as lab test results and treatments. Big health considers large and diverse cohorts, often reaching over 100 million patients in size, as well as environmental factors that are known to impact health outcomes, including socioeconomic status, health care delivery and utilization, and pollution. Big Health Data problems pose a variety of challenges for standard statistical learning, many of them nontraditional. Including a patient’s race and income in statistical analysis, for example, evokes concerns about patient privacy. Novel approaches to differential privacy may help alleviate such concerns. Other examples include modeling biased measurements and non-random missingness and causal inference in the presence of latent confounders.

In this workshop we will bring together clinicians, health data experts, and machine learning researchers working on healthcare solutions. The goal is to have a discussion to understand clinical needs and the technical challenges resulting from those needs including the development of interpretable techniques which can adapt to noisy, dynamic environments and the handling of biases inherent in the data due to being generated during routine care.

Part of our workshop includes a clinician pitch, a five-minute presentation of open clinical problems that need data-driven solutions. These presentations will be followed by a discussion between invited clinicians and attending ML ­researchers to understand how machine learning can play a role in solving the problem presented. Finally, the pitch plays a secondary role of enabling new collaborations between machine learning researchers and clinicians: an important step for machine learning to have a meaningful role in healthcare. A general call for clinician pitches will be disseminated to clinical researchers and major physician organizations, including clinician social networks such as Doximity.

We will invite submission of two­ page abstracts (not including references) for poster contributions and short oral presentations describing innovative machine learning research on relevant clinical problems and data. Topics of interest include but are not limited to models for diseases and clinical data, temporal models, Markov decision processes for clinical decision support, multi­scale data-­integration, modeling with missing or biased data, learning with non-stationary data, uncertainty and uncertainty propagation, non ­i.i.d. structure in the data, critique of models, causality, model biases, transfer learning, and incorporation of non-clinical (e.g., socioeconomic) factors.

We are seeking sponsorship to help cover the travel and registration costs for students that are
presenting posters or short contributed talks, and for clinicians participating as speakers or presenting problem pitches. Workshop organizers have already discussed sponsorship with
the NSF, and also plan to approach industry leaders.

Thu 11:15 p.m. - 11:25 p.m. [iCal]
Thu 11:25 p.m. - 12:10 a.m. [iCal]

The widespread adoption of electronic medical records has created new opportunities for clinical investigation using big data techniques. The potential for nuanced investigation across a full range of clinical questions is tremendous, contingent on the investment hospitals and health systems can make in big data infrastructure. Secondary analysis of electronic health records will enable the use of real patient data to assist clinical decision-making, with the goal of eventually providing near-real time support for bedside encounters. Clinicians and patients will derive value from data-driven decision making, while hospitals and health systems may see returns in quality, patient safety, and satisfaction. For big data analytics to achieve their potential in clinical medicine, issues of data structure, analytics staffing, funding, and data security will have to be addressed, but the future is bright and fertile for the application of big data to medical care.

Leo Anthony Celi
Fri 12:10 a.m. - 12:40 a.m. [iCal]
Eric Xing (Talk)
Eric Xing
Fri 12:40 a.m. - 1:30 a.m. [iCal]
Contributed spotlights I (Spotlight session)
Fri 1:30 a.m. - 2:00 a.m. [iCal]
Coffee break and poster session
Fri 2:00 a.m. - 2:30 a.m. [iCal]
Award session I: a word from the sponsors, followed by student talks (Award session, sponsors, student talks)
Fri 2:30 a.m. - 3:00 a.m. [iCal]
Clinician pitches & discussion I (Clinician pitches & discussion)
Fri 4:45 a.m. - 5:30 a.m. [iCal]

The wealth of data availability presents new opportunities in health but also challenges. In this talk we will focus on challenges for machine learning in health: 1. Paradoxes of the Data Society, 2. Quantifying the Value of Data, 3. Privacy, loss of control, marginalization.

Each of these challenges has particular implications for machine learning. The paradoxes relate to our evolving relationship with data and our changing expectations. Quantifying value is vital for accounting for the influence of data in our new digital economies and issues of privacy and loss of control are fundamental to how our pre-existing rights evolve as the digital world encroaches more closely on the physical.

One of the goals of research community should be to provide the technological tooling to address these challenges ensure that we are empowered to avoid the pitfalls of the data driven society, allowing us to reap the benefits of machine learning in applications from personalized health to health in the developing world.

Fri 5:30 a.m. - 6:00 a.m. [iCal]
Award session II: a word from the sponsors, followed by student talks (Award session, sponsors, student talks)
Fri 6:00 a.m. - 6:30 a.m. [iCal]
Coffee Break
Fri 6:30 a.m. - 7:00 a.m. [iCal]

Health systems worldwide are under pressure to deliver better care for more people from fewer resources. The global economic crisis has shrunk the resources available for healthcare but the growth in demand for care services continues unabated. "Learning Health Systems" is a novel health informatics paradigm that blends quality improvement methods with data science. The goal is to create an integrated health system which harnesses routinely-collected health data to learn from every patient, and feed the knowledge of “what works best” back to clinicians, public health professionals, patients, and other stakeholders to create cycles of continuous improvement. In this talk we dissect the new paradigm and explore its opportunities and challenges for data scientists.

Fri 7:00 a.m. - 7:30 a.m. [iCal]

We highlight some common (and costly) reasons for misuse of machine learning in health, illustrated using the potential outcomes framework from econometric work on causal inference. First, the failure to specify the decision which will be influenced by the prediction: the same prediction can lead to valid inferences for certain decisions but highly suspect ones for other decisions. Second, the selective labels problem: the data used to form the prediction is endogenously generated. Third, the conflation of averages with margins. We illustrate these points with two predictors that are commonly misused: readmissions and mortality. We argue that on the one hand, ignoring these problems can lead to highly misleading applications; on the other hand, judicious choice of applications and methods can allow one to circumvent these problems.

Fri 7:30 a.m. - 8:00 a.m. [iCal]
Clinician pitches & discussion II (Clinician pitches & discussion)
Fri 8:00 a.m. - 8:30 a.m. [iCal]
Poster session
Fri 8:30 a.m. - 9:00 a.m. [iCal]
Jenna Wiens (Talk)

Author Information

Uri Shalit (Technion)
Marzyeh Ghassemi (University of Toronto)
Jason Fries (Stanford University)
Rajesh Ranganath (Princeton University)

Rajesh Ranganath is a PhD candidate in computer science at Princeton University. His research interests include approximate inference, model checking, Bayesian nonparametrics, and machine learning for healthcare. Rajesh has made several advances in variational methods, especially in popularising black-box variational inference methods that automate the process of inference by making variational inference easier to use while providing more scalable, and accurate posterior approximations. Rajesh works in SLAP group with David Blei. Before starting his PhD, Rajesh worked as a software engineer for AMA Capital Management. He obtained his BS and MS from Stanford University with Andrew Ng and Dan Jurafsky. Rajesh has won several awards and fellowships including the NDSEG graduate fellowship and the Porter Ogden Jacobus Fellowship, given to the top four doctoral students at Princeton University.

THEOFANIS Karaletsos (Uber AI Labs)
David Kale (University of Southern California)
Peter Schulam (Johns Hopkins University)

Peter Schulam is a PhD student in computer science at Johns Hopkins University. His research interests include machine learning and its applications to healthcare. Peter has made methodological contributions to advancing the use of electronic health data for individualizing care in chronic diseases. His current work explores applications in autoimmune diseases. He has won the National Science Foundation (NSF) Graduate Research Fellowship and the Whiting School of Engineering Centennial Fellowship. He is working with Prof. Suchi Saria for his PhD. Prior to that, he received his master’s from Carnegie Mellon University and his bachelor’s from Princeton University.

Madalina Fiterau (UMass Amherst)

Madalina Fiterau is an Assistant Professor at the College of College of Information and Computer Sciences at UMass Amherst, with a focus on AI/ML. Previously, she was a Postdoctoral Fellow in the Computer Science Department at Stanford University, working with Professors Chris Ré and Scott Delp in the Mobilize Center. Madalina has obtained a PhD in Machine Learning from Carnegie Mellon University in September 2015, advised by Professor Artur Dubrawski. The focus of her PhD thesis, entitled “Discovering Compact and Informative Structures through Data Partitioning”, is learning interpretable ensembles, with applicability ranging from image classification to a clinical alert prediction system. Madalina is currently expanding her research on interpretable models, in part by applying deep learning to obtain salient representations from biomedical “deep” data, including time series, text and images. Madalina is the recipient of the GE Foundation Scholar Leader Award for Central and Eastern Europe. She is the recipient of the Marr Prize for Best Paper at ICCV 2015 and of Star Research Award at the Annual Congress of the Society of Critical Care Medicine 2016. She has organized two editions of the Machine Learning for Clinical Data Analysis Workshop at NIPS, in 2013 and 2014.

More from the Same Authors