Ethical Dataset Collection and Representation in Low-Resource Language Contexts
Abstract
This session will explore the ethical, legal, computational, and social dimensions of collecting low resource language dataset particularly in the context of conducting research. We will be working in small, rotating groups, where participants will engage in facilitated discussions to identify key concerns in an evolving privacy landscape and co-develop guiding principles for responsible data use.
This session will bring together researchers and, where possible, other conference attendees such as data owners, software developers, and community representatives. Together, we will tackle issues of data justice, surveillance, consent, governance, benchmark representation of marginalized groups, and the trade-offs between data utility and harm, while exploring pathways for ethical governance, transparency, and accountability. As an outcome of this session, we will synthesize insights to develop a vision paper (journal publication) on ethical and responsible use low resource language data, and participants will be invited to contribute as co-authors.