I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models

Workshop

I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models

Estefany Kelly Buchanan · Fan Feng · Andreas Kriegler · Ian Mason · Tobias Uelwer · Yubin Xie · Rui Yang

Room R02-R05 (level 2)

Sat 16 Dec, 6:45 a.m. PST

[ Abstract ] Workshop Website

In the past year, tools such as ChatGPT, Stable Diffusion and SegmentAnything have had an immediate impact on our everyday lives. Many of these tools have been built using foundation models, that is, very large models (having billions or trillions of parameters) trained on vast amounts of data (Bommasani et al., 2021). The excitement around these foundation models and their capabilities might suggest that all the interesting problems have been solved and artificial general intelligence is just around the corner (Wei et al., 2022; Bubeck et al., 2023).

At this year’s I Can’t Believe It’s Not Better workshop we invite papers to cooly reflect on this optimism and to demonstrate that there are in fact many difficult and interesting open questions. The workshop will specifically focus on failure modes of foundation models, especially unexpected negative results. In addition, we invite contributions that will help us understand current and future disruptions of machine learning subfields as well as instances where these powerful methods merely remain complementary to another subfield of machine learning.

Contributions on the failure modes of foundation models might consider:
- Domain-specific areas where the application of foundation models did not work as expected.
- Failures in the safety and explainability of foundation models.
- The limits of current foundation model methodologies.

Besides failure modes of foundation models, this workshop also considers their impact on the ML ecosystem and potential problems that remain to be solved by these new systems. In this context, relevant questions include:
- Where do foundation models leave researchers in other areas (e.g., AI for science, recommender systems, Bayesian methods, bioinformatics)?
- Which important problems are not solved by training large models with large amounts of data?
- What unexpected negative results were encountered when applying foundation models to a specific domain?

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 6:45 a.m. - 7:00 a.m.	Opening Remarks ( Introduction ) > SlidesLive Video	Ian Mason 🔗
Sat 7:00 a.m. - 7:30 a.m.	Machine Learning and Morphology: Opportunities and Challenges ( Invited Talk ) > SlidesLive Video	Wilfried Wöber 🔗
Sat 7:30 a.m. - 8:00 a.m.	Dissociating Language and Thought in Large Language Models ( Invited Talk ) > SlidesLive Video	Anna Ivanova 🔗
Sat 8:00 a.m. - 8:30 a.m.	Coffee Break	🔗
Sat 8:30 a.m. - 8:35 a.m.	Adversarial Attacks and Defenses in Large Language Models: Old and New Threats ( Spotlight ) > link SlidesLive Video Link	Leo Schwinn · David Dobre · Stephan Günnemann · Gauthier Gidel 🔗
Sat 8:35 a.m. - 8:40 a.m.	Compositional Generalization in Vision-Language Models uses the Language Modality only ( Spotlight ) > link SlidesLive Video Link	🔗
Sat 8:40 a.m. - 8:45 a.m.	A Study on the Calibration of In-context Learning ( Spotlight ) > link Link	🔗
Sat 8:45 a.m. - 8:50 a.m.	Can LLM-Generated Misinformation Be Detected? ( Spotlight ) > link SlidesLive Video Link	Canyu Chen · Kai Shu 🔗
Sat 8:50 a.m. - 8:55 a.m.	Self-Evaluation Improves Selective Generation in Large Language Models ( Spotlight ) > link SlidesLive Video Link	Jie Ren · Yao Zhao · Tu Vu · Peter Liu · Balaji Lakshminarayanan 🔗
Sat 8:55 a.m. - 9:00 a.m.	Filter bubbles and affective polarization in user-personalized large language model outputs ( Spotlight ) > link SlidesLive Video Link	Tomo Lazovich 🔗
Sat 9:00 a.m. - 10:30 a.m.	Poster Session ( Poster Session ) >	🔗
Sat 10:30 a.m. - 12:00 p.m.	Lunch ( Lunch Break ) >	🔗
Sat 12:00 p.m. - 12:30 p.m.	Active and Online Learning with Large (and Combinatorial) Models ( Invited Talk ) > SlidesLive Video	🔗
Sat 12:30 p.m. - 12:40 p.m.	When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations ( Contributed talk ) > link SlidesLive Video Link	🔗
Sat 12:40 p.m. - 12:50 p.m.	A Natural Experiment on LLM Data Contamination in Code Generation ( Contributed talk ) > link SlidesLive Video Link	🔗
Sat 12:50 p.m. - 1:00 p.m.	The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" ( Contributed talk ) > link SlidesLive Video Link	🔗
Sat 1:00 p.m. - 1:30 p.m.	Coffee Break ( Coffee Break ) >	🔗
Sat 1:30 p.m. - 2:00 p.m.	Limitations of Fine-Tuning for Aligning LLMs ( Invited Talk ) > SlidesLive Video	David Krueger 🔗
Sat 2:00 p.m. - 2:30 p.m.	Measurement in the Age of LLMs: An Application to Political Ideology Scaling ( Invited Talk ) > SlidesLive Video	Aaron Schein 🔗
Sat 2:30 p.m. - 3:20 p.m.	Panel: Failure Modes in the Age of Foundation Models. (David Krueger, Christoph Lampert, Tatiana Likhomanenko, Aaron Schein. Moderator: Naomi Saphra) ( Panel Discussion ) > SlidesLive Video	🔗
Sat 3:20 p.m. - 3:30 p.m.	Closing Remarks (Awards and outlook) ( Ending Comments ) > SlidesLive Video	Yubin Xie 🔗
-	Do Language Models Know When They're Hallucinating References? ( Poster ) > link Link	Ayush Agrawal · Mirac Suzgun · Lester Mackey · Adam Tauman Kalai 🔗
-	From Failures to Factuality: A Study on ChatGPT in Open-Domain QA ( Poster ) > link Link	Shen Zheng · Jie Huang · Kevin Chang 🔗
-	On the performance of Multimodal Language Models ( Poster ) > link Link	Utsav Garg · Erhan Bas 🔗
-	Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective ( Poster ) > link Link	Yang Chen · Yitao Liang · Zhouchen Lin 🔗
-	A Study on Improving Reasoning in Language Models ( Poster ) > link Link	Yuqing Du · Alexander Havrilla · Sainbayar Sukhbaatar · Pieter Abbeel · Roberta Raileanu 🔗
-	Interactive Model Correction with Natural Language ( Poster ) > link Link	Yoonho Lee · Michelle Lam · Helena Vasconcelos · Michael Bernstein · Chelsea Finn 🔗
-	Structure-Aware Path Inference for Neural Finite State Transducers ( Poster ) > link Link	Weiting Tan · Chu-Cheng Lin · Jason Eisner 🔗
-	Analyzing the factual knowledge of parameter efficient instruction tuned mid-size Large Language Models ( Poster ) > link Link	Anmol Nayak · Hari prasad Timmapathini 🔗
-	Beyond Erdos-Renyi: Generalization in Algorithmic Reasoning on Graphs ( Poster ) > link Link	Dobrik Georgiev · Pietro Lió · Jakub Bachurski · Junhua Chen · Tunan Shi 🔗
-	Exploring and Improving the Spatial Reasoning Abilities of Large Language Models ( Poster ) > link Link	Manasi Sharma 🔗
-	Towards Better Understanding of Domain Shift on Linear-Probed Visual Foundation Models ( Poster ) > link Link	Eric Heim 🔗
-	How Many Raters Do You Need? Power Analysis for Foundation Models ( Poster ) > link Link	Christopher Homan · Shira Wein · Chris Welty · Lora Aroyo 🔗
-	Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning? ( Poster ) > link Link	Joy Hsu · Gabriel Poesia · Jiajun Wu · Noah Goodman 🔗
-	Exploring DINO: Emergent Properties and Limitations for Synthetic Aperture Radar Imagery ( Poster ) > link Link	Joseph Alejandro Gallego Mejia · Anna Jungbluth · Laura Martínez-Ferrer · Francisco Dorr · Matthew Allen · Freddie Kalaitzis · Raul Ramos-Pollán 🔗
-	The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" ( Poster ) > link Link	Lukas Berglund · Meg Tong · Maximilian Kaufmann · Mikita Balesni · Asa Cooper Stickland · Tomasz Korbak · Owain Evans 🔗
-	Hallucination of Large Language Models in Finance: An Empirical Examination ( Poster ) > link Link	Haoqiang Kang · Xiao-Yang Liu 🔗
-	Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO's 4000 TPU Months ( Poster ) > link Link	Fady Rezk · Antreas Antoniou · Henry Gouk · Timothy Hospedales 🔗
-	Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation ( Poster ) > link Link	Yuhui Zhang · Brandon McKinzie · Zhe Gan · Vaishaal Shankar · Alexander Toshev 🔗
-	SentimentPulse: Temporal-Aware Custom Language Models vs. GPT-3.5 for Consumer Sentiment ( Poster ) > link Link	Lixiang Li · Nagender Aneja · Alina Nesen · Bharat Bhargava 🔗
-	Compositional Generalization in Vision-Language Models uses the Language Modality only ( Poster ) > link Link	Chenwei Wu · Patrick Haffner · Erran Li Li · Stefano Ermon · Rong Ge 🔗
-	A Negative Result on Gradient Matching for Selective Backprop ( Poster ) > link Link	Lukas Balles · Cedric Archambeau · Giovanni Zappella 🔗
-	Can Segment Anything Model Improve Semantic Segmentation? ( Poster ) > link Link	Maryam Qamar · Chaoning Zhang · Donghoon Kim · Muhammad Salman Ali · Sung-Ho Bae 🔗
-	When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations ( Poster ) > link Link	Aleksandar Petrov · Philip Torr · Adel Bibi 🔗
-	A Study on the Calibration of In-context Learning ( Spotlight ) >	Hanlin Zhang · yifan zhang · Yaodong Yu · Eric Xing · Himabindu Lakkaraju · Sham Kakade 🔗
-	Segment Anything Model (SAM) Enhances Pseudo-Labels for Weakly Supervised Semantic Segmentation ( Poster ) > link Link	Tianle Chen · Zheda Mai · Ruiwen Li · Wei-Lun (Harry) Chao 🔗
-	An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics ( Poster ) > link Link	Saba Ahmadi · Aishwarya Agrawal 🔗
-	Zero-shot capabilities of visual language models with prompt engineering for images of animals ( Poster ) > link Link	Andrea Tejeda Ocampo · Eric C. Orenstein · Kakani Katija 🔗
-	Surprising Deviations from Bayesian View in In-Context Learning ( Poster ) > link Link	Madhur Panwar · Kabir Ahuja · Navin Goyal 🔗
-	Exploring Social Bias in Downstream Applications of Text-to-Image Foundation Models ( Poster ) > link Link	Adhithya Prakash Saravanan · Rafal Kocielnik · Roy Jiang · Pengrui Han · Animashree Anandkumar 🔗
-	How (not) to ensemble LVLMs for VQA ( Poster ) > link Link	Lisa Alazraki · Lluis Castrejon · Mostafa Dehghani · Fantine Huot · Jasper Uijlings · Thomas Mensink 🔗
-	A Natural Experiment on LLM Data Contamination in Code Generation ( Poster ) > link Link	Manley Roberts · Himanshu Thakur · Christine Herlihy · Colin White · Samuel Dooley 🔗
-	Are large language models good annotators? ( Poster ) > link Link	Jay Mohta · Kenan Ak · Yan Xu · Mingwei Shen 🔗