Timezone: »

Artsheets for Art Datasets
Ramya Srinivasan · Remi Denton · Jordan Famularo · Negar Rostamzadeh · Fernando Diaz · Beth Coleman

Machine learning (ML) techniques are increasingly being employed within a variety of creative domains. For example, ML tools are being used to analyze the authenticity of artworks, to simulate artistic styles, and to augment human creative processes. While this progress has opened up new creative avenues, it has also paved the way for adverse downstream effects such as cultural appropriation (e.g., cultural misrepresentation, offense, and undervaluing) and representational harm. Many such concerning issues stem from the training data in ways that diligent evaluation can uncover, prevent, and mitigate. We posit that, when developing an arts-based dataset, it is essential to consider the social factors that influenced the process of conception and design, and the resulting gaps must be examined in order to maximize understanding of the dataset's meaning and future impact. Each dataset creator's decision produces opportunities, but also omissions. Each choice, moreover, builds on preexisting histories of the data's formation and handling across time by prior actors including, but not limited to, art collectors, galleries, libraries, archives, museums, and digital repositories. To illuminate the aforementioned aspects, we provide a checklist of questions customized for use with art datasets in order to help guide assessment of the ways that dataset design may either perpetuate or shift exclusions found in repositories of art data. The checklist is organized to address the dataset creator's motivation together with dataset provenance, composition, collection, pre-processing, cleaning, labeling, use (including data generation), distribution, and maintenance. Two case studies exemplify the value and application of our questionnaire.

Author Information

Ramya Srinivasan (Fujitsu Research)
Remi Denton (Google)
Remi Denton

Remi Denton (they/them) is a Staff Research Scientist at Google, within the Technology, AI, Society, and Culture team, where they study the sociocultural impacts of AI technologies and conditions of AI development. Prior to joining Google, Remi received their PhD in Computer Science from the Courant Institute of Mathematical Sciences at New York University, where they focused on unsupervised learning and generative modeling of images and video. Prior to that, they received their BSc in Computer Science and Cognitive Science at the University of Toronto. Though trained formally as a computer scientist, Remi draws ideas and methods from multiple disciplines and is drawn towards highly interdisciplinary collaborations, in order to examine AI systems from a sociotechnical perspective. Remi’s recent research centers on emerging text- and image-based generative AI, with a focus on data considerations and representational harms. Remi published under the name "Emily Denton".

Jordan Famularo (New York University, 2020 PhD)

I create compelling research on intersections between design thinking and what it means to be human. 15 years of postgraduate academic and professional experience focused on social, community, cultural heritage, and educational missions. 2020 PhD, New York University, Institute of Fine Arts

Negar Rostamzadeh (Google)
Fernando Diaz (Google)

Fernando Diaz is a research scientist at Google Brain Montréal. His research focuses on the design of information access systems, including search engines, music recommendation services and crisis response platforms is particularly interested in understanding and addressing the societal implications of artificial intelligence more generally. Previously, Fernando was the assistant managing director of Microsoft Research Montréal and a director of research at Spotify, where he helped establish its research organization on recommendation, search, and personalization. Fernando’s work has received awards at SIGIR, WSDM, ISCRAM, and ECIR. He is the recipient of the 2017 British Computer Society Karen Spärck Jones Award. Fernando has co-organized workshops and tutorials at SIGIR, WSDM, and WWW. He has also co-organized several NIST TREC initiatives, WSDM (2013), Strategic Workshop on Information Retrieval (2018), FAT* (2019), SIGIR (2021), and the CIFAR Workshop on Artificial Intelligence and the Curation of Culture (2019)

Beth Coleman (Toronto University)

More from the Same Authors