Workshop
Socially Responsible Language Modelling Research (SoLaR)
Usman Anwar 路 David Krueger 路 Samuel Bowman 路 Jakob Foerster 路 Su Lin Blodgett 路 Roberta Raileanu 路 Alan Chan 路 Laura Ruis 路 Robert Kirk 路 Yawen Duan 路 Xin Chen 路 Kawin Ethayarajh
Room R06-R09 (level 2)
Sat 16 Dec, 6:30 a.m. PST
The inaugural Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023 is an interdisciplinary gathering that aims to foster responsible and ethical research in the field of language modeling. Recognizing the significant risks and harms [33-37] associated with the development, deployment, and use of language models, the workshop emphasizes the need for researchers to focus on addressing these risks starting from the early stages of development. The workshop brings together experts and practitioners from various domains and academic fields with a shared commitment to promoting fairness, equity, accountability, transparency, and safety in language modeling research. In addition to technical works on socially responsible language modeling research, we also encourage sociotechnical submissions from other disciplines such as philosophy, law, and policy, in order to foster an interdisciplinary dialogue on the societal impacts of LMs.
Schedule
Sat 6:30 a.m. - 7:10 a.m.
|
LLM As A Cultural Interlocutor? Rethinking Socially Aware NLP in Practice
(
Invited Talk
)
>
SlidesLive Video |
Diyi Yang 馃敆 |
Sat 7:10 a.m. - 7:15 a.m.
|
Best Paper Talk - Low Resources Language Jailbreak GPT-4
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 7:20 a.m. - 8:00 a.m.
|
Grounded Evaluations for Assessing Real-World Harms
(
Invited Talk
)
>
SlidesLive Video |
Deborah Raji 馃敆 |
Sat 8:30 a.m. - 9:30 a.m.
|
Panel on Socially Responsible Language Modelling Research
(
Panel
)
>
SlidesLive Video |
馃敆 |
Sat 9:30 a.m. - 10:10 a.m.
|
Economic Disruption and Alignment of LLMs
(
Invited Talk
)
>
SlidesLive Video |
Anton Korinek 馃敆 |
Sat 11:30 a.m. - 1:00 p.m.
|
Poster Session
(
Posters
)
>
|
馃敆 |
Sat 1:00 p.m. - 1:40 p.m.
|
Can LLMs Keep a Secret and Serve Pluralistic Values? On Privacy and Moral Implications of LLMs
(
Invited Talk
)
>
SlidesLive Video |
Yejin Choi 馃敆 |
Sat 2:00 p.m. - 2:40 p.m.
|
Universal Jailbreaks
(
Invited Talk
)
>
SlidesLive Video |
Andy Zou 馃敆 |
Sat 2:40 p.m. - 2:45 p.m.
|
Oral 1 - Social Contract AI: Aligning AI Assistants with Implicit Group Norms
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 2:45 p.m. - 2:50 p.m.
|
Oral 2 - Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset
(
Contributed Talk
)
>
SlidesLive Video |
馃敆 |
Sat 2:50 p.m. - 3:30 p.m.
|
Can LLMs reason without Chain-of-Thought?
(
Invited Talk
)
>
SlidesLive Video |
Owain Evans 馃敆 |
-
|
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models ( Poster ) > link | Thomas Zollo 路 Todd Morrill 路 Zhun Deng 路 Jake Snell 路 Toniann Pitassi 路 Richard Zemel 馃敆 |
-
|
Weakly Supervised Detection of Hallucinations in LLM Activations ( Poster ) > link | Miriam Rateike 路 Celia Cintas 路 John Wamburu 路 Tanya Akumu 路 Skyler D. Speakman 馃敆 |
-
|
Do Personality Tests Generalize to Large Language Models? ( Poster ) > link | Florian E. Dorner 路 Tom S眉hr 路 Samira Samadi 路 Augustin Kelava 馃敆 |
-
|
MoPe: Model Perturbation-based Privacy Attacks on Language Models ( Poster ) > link | Jason Wang 路 Jeffrey Wang 路 Marvin Li 路 Seth Neel 馃敆 |
-
|
Language Model Detectors Are Easily Optimized Against ( Poster ) > link | Charlotte Nicks 路 Eric Mitchell 路 Rafael Rafailov 路 Archit Sharma 路 Christopher D Manning 路 Chelsea Finn 路 Stefano Ermon 馃敆 |
-
|
Jailbreaking Language Models at Scale via Persona Modulation ( Poster ) > link | Rusheb Shah 路 Quentin Feuillade Montixi 路 Soroush Pour 路 Arush Tagade 路 Javier Rando 馃敆 |
-
|
FlexModel: A Framework for Interpretability of Distributed Large Language Models ( Spotlight ) > link | Matthew Choi 路 Muhammad Adil Asif 路 John Willes 路 David B. Emerson 馃敆 |
-
|
Large Language Model Unlearning ( Poster ) > link | Yuanshun (Kevin) Yao 路 Xiaojun Xu 路 Yang Liu 馃敆 |
-
|
FairSISA: Ensemble Post-Processing to Improve Fairness of Unlearning in LLMs ( Poster ) > link | Swanand Kadhe 路 Anisa Halimi 路 Ambrish Rawat 路 Nathalie Baracaldo 馃敆 |
-
|
Efficient Evaluation of Bias in Large Language Models through Prompt Tuning ( Poster ) > link | Jacob-Junqi Tian 路 David B. Emerson 路 Deval Pandya 路 Laleh Seyyed-Kalantari 路 Faiza Khattak 馃敆 |
-
|
Dissecting Large Language Models ( Poster ) > link | Nicky Pochinkov 路 Nandi Schoots 馃敆 |
-
|
Comparing Optimization Targets for Contrast-Consistent Search ( Poster ) > link | Hugo Fry 路 Seamus Fallows 路 Jamie Wright 路 Ian Fan 路 Nandi Schoots 馃敆 |
-
|
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models ( Poster ) > link | Sicheng Zhu 路 Ruiyi Zhang 路 Bang An 路 Gang Wu 路 Joe Barrow 路 Zichao Wang 路 Furong Huang 路 Ani Nenkova 路 Tong Sun 馃敆 |
-
|
Low-Resource Languages Jailbreak GPT-4 ( Spotlight ) > link | Yong Zheng-Xin 路 Cristina Menghini 路 Stephen Bach 馃敆 |
-
|
Post-Deployment Regulatory Oversight for General-Purpose Large Language Models ( Poster ) > link | Carson Ezell 路 Abraham Loeb 馃敆 |
-
|
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment ( Poster ) > link | Yang Liu 路 Yuanshun (Kevin) Yao 路 Jean-Francois Ton 路 Xiaoying Zhang 路 Ruocheng Guo 路 Hao Cheng 路 Yegor Klochkov 路 Muhammad Faaiz Taufiq 路 Hang Li 馃敆 |
-
|
Are Large Language Models Really Robust to Word-Level Perturbations? ( Poster ) > link |
13 presentersHaoyu Wang 路 Guozheng Ma 路 Cong Yu 路 Gui Ning 路 Linrui Zhang 路 Zhiqi Huang 路 Suwei Ma 路 Yongzhe Chang 路 Sen Zhang 路 Li Shen 路 Xueqian Wang 路 Peilin Zhao 路 Dacheng Tao |
-
|
Eliciting Language Model Behaviors using Reverse Language Models ( Spotlight ) > link | Jacob Pfau 路 Alex Infanger 路 Abhay Sheshadri 路 Ayush Panda 路 Julian Michael 路 Curtis Huebner 馃敆 |
-
|
Controlled Decoding from Language Models ( Spotlight ) > link |
12 presentersSidharth Mudgal 路 Jong Lee 路 Harish Ganapathy 路 YaGuang Li 路 Tao Wang 路 Yanping Huang 路 zhifeng Chen 路 Heng-Tze Cheng 路 Michael Collins 路 Jilin Chen 路 Alex Beutel 路 Ahmad Beirami |
-
|
The Effect of Group Status on the Variability of Group Representations in LLM-generated Text ( Poster ) > link | Messi Lee 路 Calvin Lai 路 Jacob Montgomery 馃敆 |
-
|
Learning Inner Monologue and Its Utilization in Vision-Language Challenges ( Poster ) > link | Diji Yang 路 Kezhen Chen 路 Jinmeng Rao 路 Xiaoyuan Guo 路 Yawen Zhang 路 Jie Yang 路 Yi Zhang 馃敆 |
-
|
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features ( Poster ) > link | Diogo Cruz 路 Edoardo Pona 路 Alex Holness-Tofts 路 Elias Schmied 路 V铆ctor Abia Alonso 路 Charlie J Griffin 路 Bogdan-Ionut Cirstea 馃敆 |
-
|
Bridging Predictive Minds: LLMs As Atypical Active Inference Agents ( Poster ) > link | Jan Kulveit 馃敆 |
-
|
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation ( Poster ) > link | Xiangjue Dong 路 Yibo Wang 路 Philip S Yu 路 James Caverlee 馃敆 |
-
|
A Simple Test of Expected Utility Theory with GPT ( Spotlight ) > link | Mengxin Wang 馃敆 |
-
|
Towards Auditing Large Language Models: Improving Text-based Stereotype Detection ( Poster ) > link | Zekun Wu 路 Sahan Bulathwela 路 Adriano Koshiyama 馃敆 |
-
|
Welfare Diplomacy: Benchmarking Language Model Cooperation ( Poster ) > link | Gabe Mukobi 路 Hannah Erlebach 路 Niklas Lauffer 路 Lewis Hammond 路 Alan Chan 路 Jesse Clifton 馃敆 |
-
|
A Divide-Conquer-Reasoning Approach to Consistency Evaluation and Improvement in Blackbox Large Language Models ( Poster ) > link | Wendi Cui 路 Jiaxin Zhang 路 Zhuohang Li 路 Damien Lopez 路 Kamalika Das 路 Bradley Malin 路 Sricharan Kumar 馃敆 |
-
|
Compositional preference models for alignment with scalable oversight ( Spotlight ) > link | Dongyoung Go 路 Tomasz Korbak 路 Germ谩n Kruszewski 路 Jos Rozen 路 Marc Dymetman 馃敆 |
-
|
Investigating the Fairness of Large Language Models for Predictions on Tabular Data ( Poster ) > link | Yanchen Liu 路 Srishti Gautam 路 Jiaqi Ma 路 Himabindu Lakkaraju 馃敆 |
-
|
Localizing Lying in Llama: Experiments in Prompting, Probing, and Patching ( Poster ) > link | James Campbell 路 Phillip Guo 路 Richard Ren 馃敆 |
-
|
User Inference Attacks on LLMs ( Poster ) > link | Nikhil Kandpal 路 Krishna Pillutla 路 Alina Oprea 路 Peter Kairouz 路 Christopher A. Choquette-Choo 路 Zheng Xu 馃敆 |
-
|
Interpretable Stereotype Identification through Reasoning ( Poster ) > link | Jacob-Junqi Tian 路 Omkar Dige 路 David B. Emerson 路 Faiza Khattak 馃敆 |
-
|
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models ( Spotlight ) > link | Alan Chan 路 Benjamin Bucknall 路 Herbie Bradley 路 David Krueger 馃敆 |
-
|
Developing A Conceptual Framework for Analyzing People in Unstructured Data ( Poster ) > link | Mark D铆az 路 Sunipa Dev 路 Emily Reif 路 Remi Denton 路 Vinodkumar Prabhakaran 馃敆 |
-
|
Breaking Physical and Linguistic Borders: Privacy-Preserving Multilingual Prompt Tuning for Low-Resource Languages ( Spotlight ) > link | Wanru Zhao 路 Yihong Chen 馃敆 |
-
|
Measuring Feature Sparsity in Language Models ( Spotlight ) > link | Mingyang Deng 路 Lucas Tao 路 Joe Benton 馃敆 |
-
|
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints ( Poster ) > link | Chaoqi Wang 路 Yibo Jiang 路 Chenghao Yang 路 Han Liu 路 Yuxin Chen 馃敆 |
-
|
Social Contract AI: Aligning AI Assistants with Implicit Group Norms ( Spotlight ) > link | Jan-Philipp Fraenken 路 Samuel Kwok 路 Peixuan Ye 路 Kanishk Gandhi 路 Dilip Arumugam 路 Jared Moore 路 Alex Tamkin 路 Tobias Gerstenberg 路 Noah Goodman 馃敆 |
-
|
Evaluating Superhuman Models with Consistency Checks ( Spotlight ) > link | Lukas Fluri 路 Daniel Paleka 路 Florian Tramer 馃敆 |
-
|
Testing Language Model Agents Safely in the Wild ( Poster ) > link | Silen Naihin 路 David Atkinson 路 Marc Green 路 Merwane Hamadi 路 Craig Swift 路 Douglas Schonholtz 路 Adam Tauman Kalai 路 David Bau 馃敆 |
-
|
KoMultiText: Large-Scale Korean Text Dataset for Classifying Biased Speech in Real-World Online Services ( Poster ) > link | Dasol Choi 路 Jooyoung Song 路 Eunsun Lee 路 Seo Jin woo 路 HeeJune Park 路 Dongbin Na 馃敆 |
-
|
An International Consortium for AI Risk Evaluations ( Poster ) > link |
11 presentersRoss Gruetzemacher 路 Alan Chan 路 艩t臎p谩n Los 路 Kevin Frazier 路 Simeon Campos 路 Matija Franklin 路 Jos茅 Hern谩ndez-Orallo 路 James Fox 路 Christin Manning 路 Philip M Tomei 路 Kyle Kilian |
-
|
Citation: A Key to Building Responsible and Accountable Large Language Models ( Poster ) > link | Jie Huang 路 Kevin Chang 馃敆 |
-
|
Towards Optimal Statistical Watermarking ( Spotlight ) > link | Baihe Huang 路 Banghua Zhu 路 Hanlin Zhu 路 Jason Lee 路 Jiantao Jiao 路 Michael Jordan 馃敆 |
-
|
SuperHF: Supervised Iterative Learning from Human Feedback ( Poster ) > link | Gabe Mukobi 路 Peter Chatain 路 Su Fong 路 Robert Windesheim 路 Gitta Kutyniok 路 Kush Bhatia 路 Silas Alberti 馃敆 |
-
|
Training Private and Efficient Language Models with Synthetic Data from LLMs ( Poster ) > link | Da Yu 路 Arturs Backurs 路 Sivakanth Gopi 路 Huseyin A. Inan 路 Janardhan Kulkarni 路 Zinan Lin 路 Chulin Xie 路 Huishuai Zhang 路 Wanrong Zhang 馃敆 |
-
|
Towards a Situational Awareness Benchmark for LLMs ( Spotlight ) > link | Rudolf Laine 路 Alexander Meinke 路 Owain Evans 馃敆 |
-
|
Risk Assessment and Statistical Significance in the Age of Foundation Models ( Poster ) > link | Apoorva Nitsure 路 Youssef Mroueh 路 Mattia Rigotti 路 Kristjan Greenewald 路 Brian Belgodere 路 Mikhail Yurochkin 路 Jiri Navratil 路 Igor Melnyk 路 Jarret Ross 馃敆 |
-
|
An Archival Perspective on Pretraining Data ( Spotlight ) > link | Meera Desai 路 Abigail Jacobs 路 Dallas Card 馃敆 |
-
|
Bayesian low-rank adaptation for large language models ( Spotlight ) > link | Adam Yang 路 Maxime Robeyns 路 Xi Wang 路 Laurence Aitchison 馃敆 |
-
|
A collection of principles for guiding and evaluating large language models ( Poster ) > link | Konstantin Hebenstreit 路 Robert Praas 路 Matthias Samwald 馃敆 |
-
|
Are Models Biased on Text without Gender-related Language? ( Poster ) > link | Catarina Bel茅m 路 Preethi Seshadri 路 Yasaman Razeghi 路 Sameer Singh 馃敆 |
-
|
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT ( Poster ) > link | Zechen Zhang 路 Dean Hazineh 路 Jeffrey Chiu 馃敆 |
-
|
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment'' in Large Language Models ( Poster ) > link | Hannah Rose Kirk 路 Bertie Vidgen 路 Paul Rottger 路 Scott Hale 馃敆 |
-
|
Understanding Hidden Context in Preference Learning: Consequences for RLHF ( Poster ) > link | Anand Siththaranajn 路 Cassidy Laidlaw 路 Dylan Hadfield-Menell 馃敆 |
-
|
Subtle Misogyny Detection and Mitigation: An Expert-Annotated Dataset ( Spotlight ) > link | Anna Richter 路 Brooklyn Sheppard 路 Allison Cohen 路 Elizabeth Smith 路 Tamara Kneese 路 Carolyne Pelletier 路 Ioana Baldini 路 Yue Dong 馃敆 |
-
|
Towards Publicly Accountable Frontier LLMs ( Poster ) > link | Markus Anderljung 路 Everett Smith 路 Joe O'Brien 路 Lisa Soder 路 Benjamin Bucknall 路 Emma Bluemke 路 Jonas Schuett 路 Robert Trager 路 Lacey Strahm 路 Rumman Chowdhury 馃敆 |
-
|
Successor Heads: Recurring, Interpretable Attention Heads In The Wild ( Poster ) > link | Rhys Gould 路 Euan Ong 路 George Ogden 路 Arthur Conmy 馃敆 |
-
|
Forbidden Facts: An Investigation of Competing Objectives in Llama 2 ( Poster ) > link | Tony Wang 路 Miles Wang 路 Kaivalya Hariharan 路 Nir Shavit 馃敆 |