Skip to yearly menu bar Skip to main content


(3 events)   Timezone:  
Show all
Toggle Poster Visibility
Mexico City Oral
Wed Dec 03 03:30 PM -- 03:50 PM (PST) @ Don Alberto 2 None
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno · Federico Pasqualotto · Andrea Agazzi
Mexico City Oral
Wed Dec 03 03:50 PM -- 04:10 PM (PST) @ Don Alberto 2 None
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet · Francesco D'Angelo · Andrew Lampinen · Stephanie Chan
[ Slides
Mexico City Oral
Wed Dec 03 04:10 PM -- 04:30 PM (PST) @ Don Alberto 2 None
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Zheng-An Chen · Tao Luo