firstbacksecondback
4 Results
Workshop
|
Sat 13:00 |
Panel: On Linear Representations and Pretraining Data Frequency in Language Models When Attention Sink Emerges in Language Models: An Empirical View Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations U-shape |
|
Workshop
|
When Attention Sink Emerges in Language Models: An Empirical View Xiangming Gu · Tianyu Pang · Chao Du · Qian Liu · Fengzhuo Zhang · Cunxiao Du · Ye Wang · Min Lin |
||
Workshop
|
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei |
||
Workshop
|
Sat 15:30 |
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs Tianyu Guo · Druv Pai · Yu Bai · Jiantao Jiao · Michael Jordan · Song Mei |