Timezone: »

 
Workshop
Machine Learning for the Developing World (ML4D): Achieving sustainable impact
William Herlands · Maria De-Arteaga · Amanda Coston

Sat Dec 08 05:00 AM -- 03:30 PM (PST) @ Room 510 BD
Event URL: https://sites.google.com/view/ml4d-nips-2018/ »

Global development experts are beginning to employ ML for diverse problems such as aiding rescue workers allocate resources during natural disasters, providing intelligent educational and healthcare services in regions with few human experts, and detecting corruption in government contracts. While ML represents a tremendous hope for accelerated development and societal change, it is often difficult to ensure that machine learning projects provide their promised benefit. The challenging reality in developing regions is that pilot projects disappear after a few years or do not have the same effect when expanded beyond the initial test site, and prototypes of novel methodologies are often never deployed.

At the center of this year’s program is how to achieve sustainable impact of Machine Learning for the Developing World (ML4D). This one-day workshop will bring together a diverse set of participants from across the globe to discuss major roadblocks and paths to action. Practitioners and development experts will discuss essential elements for ensuring successful deployment and maintenance of technology in developing regions. Additionally, the workshop will feature cutting edge research in areas such as transfer learning, unsupervised learning, and active learning that can help ensure long-term ML system viability. Attendees will learn about contextual components to ensure effective projects, development challenges that can benefit from machine learning solutions, and how these problems can inspire novel machine learning research.

The workshop will include invited and contributed talks, a poster session of accepted papers, panel discussions, and breakout sessions tailored to the workshop theme. We welcome paper submissions focussing on core ML methodology addressing ML4D roadblocks, application papers that showcase successful examples of ML4D, and research that evaluates the societal impact of ML.

Sat 5:45 a.m. - 6:00 a.m. [iCal]
Introductory remarks (Talk)
Artur Dubrawski
Sat 6:00 a.m. - 6:30 a.m. [iCal]

In this talk, I will share lessons from our efforts on the ground in creating AI-for-social-good solutions, spanning cellphone-image-based anthropometry for babies, AI-enabled active case-finding in tuberculosis, and early pest detection in cotton farming. The promise of AI as a powerful aid for achieving global-development goals is bolstered by five current forces: large frontline workforces enabling service delivery and data collection, growing smartphone penetration providing compute, connectivity, imaging, localization, and interfaces, large tech-enabled development programs having established data pipelines, infrastructure, and processes, rural populations increasingly adopting technology, and strong policy and institutional support for AI in development. We recommend that AI-for-social-good efforts utilize these forces by piggybacking on large tech-enabled development programs to achieve scaled impact. I will provide examples for such programs, point to opportunity areas, list criteria for AI-for-social-good innovators to assess likelihood of scaled impact, discuss risks and mitigation strategies, and suggest frontier areas for AI-for-social-good research.

Rahul Panicker, P. Anandan
Sat 6:30 a.m. - 7:00 a.m. [iCal]

LIRNEasia has been working on leveraging big data for public purposes since 2012. As an organization situated in a developing country, we have experienced challenges in developing new insights, and informing policy and government processes. When leveraging big data and machine learning for development purposes, developing countries face three main inter-related challenges: 1. Skills: data scientists are in short supply and developing skills to make use of these new data sources become paramount. How should we build these skills? What should be the composition of research teams? 2. Data: accessing private sector data as well as government data can both be challenging. In an imperfect, often inconsistent regulatory environment, how can we facilitate responsible data access and use? 3. Policy impact and mainstreaming: Except in extreme cases most policy domains already have pre-existing established processes for generating and incorporating evidence in policy planning and implementation. How do we disrupt these ‘sticky’ processes with new forms of data and techniques? This talk will address these three sets of challenges and our experiences in tackling them.

Sriganesh Lokanathan
Sat 7:00 a.m. - 7:30 a.m. [iCal]

In the pursuit of public service, governments have to oversee many complex systems. In recent years, data-driven methodologies have been adopted as tools to oversee and enhance service delivery. In this talk I will discuss the ways that the government of South Africa, and its agencies, use data tools as well as the policies and investments that they have been put into place; some of which have created a more enabling ecosystem while others have created difficulties and challenges. I will discuss the current data landscape from the lens of Open Data policies, data readiness policies, and human capital development initiatives. This talk will be a summary of the work we have done in the past four years. It will be a discussion of our observations, our successes, aspirations and challenges we encountered as we continue towards a data-driven governance.

Nyalleng Moorosi
Sat 8:00 a.m. - 8:15 a.m. [iCal]

Armed conflict has contributed to an unprecedented number of internally displaced persons (IDPs) - individuals who are forced out of their homes but remain within their country. IDPs often urgently require shelter, food, and healthcare, yet prediction of when fluxes of IDPs will cross into an area remains a major challenge for aid delivery organizations. We sought to develop an approach to more accurately forecast IDP migration that could empower humanitarian aid groups to more effectively allocate resources during conflicts. We modeled monthly IDP flow between provinces within Syria and within Yemen using heterogeneous data on food prices, fuel prices, wages, location, time, and conflict reports. We show that our machine learning approach outperforms baseline persistence methods of forecasting. Integrating diverse data sources into machine learning models thus appears to improve IDP migration prediction.

Benjamin Huynh
Sat 8:15 a.m. - 8:30 a.m. [iCal]

High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 3.5 million enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.

Simón Ramírez Amaya
Sat 8:30 a.m. - 9:30 a.m. [iCal]
Poster session: Contributed papers (Poster session)
Milan Cvitkovic, Arijit Patra, Yunpeng Li, RAHMAN BANYA SAFF SANYA, Guanghua Chi, Benjamin Huynh, Hamed Alemohammad, Simón Ramírez Amaya, Nazmus Saquib, Jade Abbott, Teo de Campos, Viraj Prabhu, Alvaro Riascos, Hafte Abera, praney dubey dubey, Tanushyam Chattopadhyay, Hsiang Hsu, Mayank Jain, Kartikeya Bhardwaj, Gabriel Cadamuro, Bradley Gram-Hansen, Georg Dorffner
Sat 11:00 a.m. - 11:15 a.m. [iCal]

Machine learning research for developing countries can demonstrate clear sustainable impact by delivering actionable and timely information to in-country government organisations (GOs) and NGOs in response to their critical information requirements. We co-create products with UK and in-country commercial, GO and NGO partners to ensure the machine learning algorithms address appropriate user needs whether for tactical decision making or evidence-based policy decisions. In one particular case, we developed and deployed a novel algorithm, BCCNet, to quickly process large quantities of unstructured data to prevent and respond to natural disasters. Crowdsourcing provides an efficient mechanism to generate labels from unstructured data to prime machine learning algorithms for large scale data analysis. However, these labels are often imperfect with qualities varying among different citizen scientists, which prohibits their direct use with many state-of-the-art machine learning techniques. We describe BCCNet, a framework that simultaneously aggregates biased and contradictory labels from the crowd and trains an automatic classifier to process new data. Our case studies, mosquito sound detection for malaria prevention and damage detection for disaster response, show the efficacy of our method in the challenging context of developing world applications.

Olga Isupova
Sat 11:15 a.m. - 11:30 a.m. [iCal]

A major challenge in pre-natal healthcare delivery is the lack of devices and clinicians in several areas of the developing world. While the advent of portable ultrasound machines and more recently, handheld probes, have brought down the capital costs, the shortage of trained manpower is a serious impediment towards ensuring the mitigation of maternal and infant mortality. Diagnosis of pre-natal ultrasound towards several key pre-natal health indicators can be modelled as an image analysis problem amenable to present day state-of-the art deep learning based image and video understanding pipelines. However, deep learning based analysis typically involves memory intensive models and the requirement of significant computational resources, which is a challenging prospect in point-of-care healthcare applications in the developing world. With the advent of portable ultra-sound systems, it is increasingly possible to expand the reach of prenatal health diagnosis. To accomplish that, there is a need for lightweight architectures that can perform image analysis tasks without a large memory or computational footprint. We propose a lightweight convolutional architecture for assessment of ultrasound videos, suitable for those acquired using mobile probes or converted from a DI-COM standard from portable machines. As exemplar of approach, we validated our pipeline for fetal heart assessment (a first step towards identification of congenital heart defects) inclusive of viewing plane identification and visibility prediction in fetal echocardiography. This was attempted by models using optimised kernel windows and the construction of image representations using salient features from multiple scales with relative feature importance gauged at each of these scales using weighted attention maps for different stages of the convolutional operations. Such a representation is found to improve model performances at significant economization of model size, and has been validated on real-world clinical videos.

Sat 11:30 a.m. - 12:00 p.m. [iCal]

Researchers from across the social and computer sciences are increasingly using machine learning to study and address global development challenges, and an exciting new field of "Machine Learning for the Developing World", or "Machine Learning for Development" (ML4D) is beginning to emerge. In recent work (De Arteaga et al., ACM TMIS, 2018), we synthesize the prominent literature in the field and attempt to answer the key questions, "What is ML4D, and where is the field headed?". Based on the literature, we identify a set of best practices for ensuring that ML4D projects are relevant to the advancement of development objectives. Given the strong alignment between development needs and ML approaches, we lay out a roadmap detailing three technical stages where ML4D can play an essential role and meaningfully contribute to global development. Perhaps the most important aspect of ML4D is that development challenges are treated as research questions, not as roadblocks: we believe that the ML4D field can flourish in the coming years by using the unique challenges of the developing world as opportunities to inspire novel and impactful research across multiple machine learning disciplines. This talk is based on joint work with Maria de Arteaga, William Herlands, and Artur Dubrawski.

Daniel Neill
Sat 12:30 p.m. - 1:00 p.m. [iCal]
Using ML to locate hidden graves in Mexico (Talk)
Monica Meltis Vejar
Sat 1:00 p.m. - 1:30 p.m. [iCal]

In wealthy nations, novel sources of data from the internet and social media are enabling new approaches for social science research and public policy. In developing countries, by contrast, fewer sources of such data exist, and researchers and policymakers often rely on data that are unreliable or out of date. Here, we develop a new approach for measuring the dynamic welfare of individuals remotely by analyzing their logs of mobile phone use. We calibrate our approach with an original high-frequency panel survey of 1,200 Afghans, and an experimental protocol that randomized the timing and value of an unconditional cash transfer to each respondent. We show that mobile phone metadata, obtained with the respondent's consent from Afghanistan's largest mobile phone company, can be used to estimate the social and economic well-being of respondents, including the onset of positive and negative shocks. We discuss the potential for such methods to transform current practices of policy monitoring and impact evaluation.

Joshua Blumenstock
Sat 1:30 p.m. - 2:30 p.m. [iCal]
Challenges and Opportunities in ML4D (Discussion Panel)

Author Information

William Herlands (Carnegie Mellon University)
Maria De-Arteaga (Carnegie Mellon University)

Maria is a joint PhD candidate in Machine Learning and Public Policy at Carnegie Mellon University’s Machine Learning Department and the Heinz College of Information Systems and Public Policy. Machine learning (ML) is increasingly being used to support decision-making in critical settings, where predictions have potentially grave implications over human lives. Examples include healthcare, hiring, child welfare, and criminal justice. Maria's research focuses on the risks and opportunities of ML-based predictions to support decision-making in the context of sustainable societies. As part of her work on algorithmic fairness and accountability, she characterizes how societal biases encoded in historical data may be reproduced and amplified by ML models, and develops algorithms to mitigate these risks. Moreover, even if data does not encode harmful societal biases, many challenges still prevent the effective use of predictions to improve decision-making, such as omitted payoff bias and the selective labels problem. In her research, Maria seeks to understand the limits and risks of using machine learning in these contexts, and to develop human-centered ML that can improve expert decision-making. She holds a M.Sc. in Machine Learning from Carnegie Mellon University (2017) and a B.Sc. in Mathematics from Universidad Nacional de Colombia (2013). She was an intern at Microsoft Research, Redmond, in 2017 and at Microsoft Research, New England, in 2018. Prior to graduate school, she worked as a data science researcher and as an investigative journalist. Her work has been awarded the Best Thematic Paper Award at NAACL’19, the Innovation Award on Data Science at Data for Policy’16, and has been featured by UN Women and Global Pulse in their report Gender Equality and Big Data: Making Gender Data Visible. She is a co-founder of the NeurIPS Machine Learning for the Developing World (ML4D) Workshop, and a recipient of a 2018 Microsoft Research Dissertation Grant.

Amanda Coston (Carnegie Mellon University)

More from the Same Authors