Skip to yearly menu bar Skip to main content


Workshop

Foundation Model Interventions

Pau Rodriguez · Arno Blaas · Desi R Ivanova · Sahra Ghalebikesabi · Yuki M Asano · Katherine Metcalf · Xavier Suau

Meeting 121, 122

Sun 15 Dec, 8:15 a.m. PST

The increasing capabilities of foundation models have raised concerns about their potential to generate undesirable content, perpetuate biases, and promote harmful behaviors. To address these issues, we propose a workshop that focuses on understanding the inner workings of foundation models and identifying actionable mechanisms involved in generation. Recent studies have shown promise in directly intervening on model activations or a low-rank subset of the weights to provide fine-grained control over model generation to mitigate the generation of harmful and toxic content. This workshop aims to bring together researchers to explore methods for improving the controllability of foundation models and developing a better understanding of their behavior and potential misuse.

Live content is unavailable. Log in and register to view live content