Workshop
|
|
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu · Tianyu Pang · Chao Du · Qian Liu · Fengzhuo Zhang · Cunxiao Du · Ye Wang · Min Lin
|
|
Workshop
|
Sat 13:00
|
Panel: On Linear Representations and Pretraining Data Frequency in Language Models When Attention Sink Emerges in Language Models: An Empirical View Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations U-shape
|
|
Workshop
|
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams · Micah Carroll · Constantin Weisser · Adhyyan Narang · Brendan Murphy · Anca Dragan
|
|
Workshop
|
|
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye · Jingyi Chai · Xiangrui Liu · Yaodong Yang · Yanfeng Wang · Siheng Chen
|
|
Workshop
|
|
Emergence of Implicit World Models from Mortal Agents
Kazuya Horibe · Naoto Yoshida
|
|
Workshop
|
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan
|
|