Skip to yearly menu bar Skip to main content

Workshop: Socially Responsible Language Modelling Research (SoLaR)

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Alan Chan · Benjamin Bucknall · Herbie Bradley · David Krueger


Public release of the weights of pre-trained foundation models, otherwise known as downloadable access \citep{solaimangradient2023}, enables fine-tuning without the prohibitive expense of pre-training. Our work argues that increasingly accessible fine-tuning of downloadable models will likely increase hazard. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that more accessible fine-tuning methods would increase hazard through enabling malicious, non-state actors and diffusing responsibility for harms. We conclude with a discussion of the limitations of our work, notably that we do not evaluate the potential benefits of more accessible fine-tuning or the effects on vulnerability or exposure.

Chat is not available.