Mathematics of Modern Machine Learning (M3L)

Sat 6:50 a.m. - 7:00 a.m.

Opening Remarks ( Opening Remarks ) >
SlidesLive Video

🔗

Sat 7:00 a.m. - 7:45 a.m.

From algorithms to neural networks and back ( Invited Talk ) >
SlidesLive Video

Andrej Risteski 🔗

Sat 7:45 a.m. - 8:30 a.m.

How do two-layer neural networks learn complex functions from data over time? ( Invited Talk ) >
SlidesLive Video

Florent Krzakala 🔗

Sat 8:30 a.m. - 8:40 a.m.

Feature Learning in Infinite-Depth Neural Networks ( Oral ) > link
SlidesLive Video

Link

Greg Yang · Dingli Yu · Chen Zhu · Soufiane Hayou 🔗

Sat 8:40 a.m. - 8:50 a.m.

Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions ( Oral ) > link
SlidesLive Video

Link

Yilong Qin · Andrej Risteski 🔗

Sat 8:50 a.m. - 9:00 a.m.

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models ( Oral ) > link
SlidesLive Video

Link

Song Mei · Yuchen Wu 🔗

Sat 9:00 a.m. - 9:10 a.m.

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data ( Oral ) > link
SlidesLive Video

Link

Zhiwei Xu · Yutong Wang · Spencer Frei · Gal Vardi · Wei Hu 🔗

Sat 9:10 a.m. - 10:10 a.m.

Poster Session ( Poster Session ) >

🔗

Sat 10:10 a.m. - 11:15 a.m.

Lunch Break ( Lunch Break ) >

🔗

Sat 11:15 a.m. - 12:00 p.m.

Benefits of learning with symmetries: eigenvectors, graph representations and sample complexity ( Invited Talk ) >
SlidesLive Video

Stefanie Jegelka 🔗

Sat 12:00 p.m. - 12:15 p.m.

Break

🔗

Sat 12:15 p.m. - 1:00 p.m.

Adaptivity in Domain Adaptation and Friends ( Invited Talk ) >
SlidesLive Video

Samory Kpotufe 🔗

Sat 1:00 p.m. - 1:10 p.m.

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit ( Oral ) > link
SlidesLive Video

Link

Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan 🔗

Sat 1:10 p.m. - 1:20 p.m.

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP ( Oral ) > link
SlidesLive Video

Link

Zixiang Chen · Yihe Deng · Yuanzhi Li · Quanquan Gu 🔗

Sat 1:20 p.m. - 1:30 p.m.

In-Context Convergence of Transformers ( Oral ) > link
SlidesLive Video

Link

Yu Huang · Yuan Cheng · Yingbin Liang 🔗

Sat 1:30 p.m. - 1:40 p.m.

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study ( Oral ) > link
SlidesLive Video

Link

Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun 🔗

Sat 1:40 p.m. - 1:50 p.m.

Linear attention is (maybe) all you need (to understand transformer optimization) ( Oral ) > link
SlidesLive Video

Link

Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra 🔗

Sat 1:50 p.m. - 2:00 p.m.

Closing Remarks ( Closing Remarks ) >
SlidesLive Video

🔗

Sat 2:00 p.m. - 3:00 p.m.

Poster Session ( Poster Session ) >

🔗

-

A PAC-Bayesian Perspective on the Interpolating Information Criterion ( Poster ) > link

Link

Liam Hodgkinson · Chris van der Heide · Robert Salomone · Fred Roosta · Michael Mahoney 🔗

-

Graph Neural Networks Benefit from Structural Information Provably: A Feature Learning Perspective ( Poster ) > link

Link

Wei Huang · Yuan Cao · Haonan Wang · Xin Cao · Taiji Suzuki 🔗

-

Linear attention is (maybe) all you need (to understand transformer optimization) ( Poster ) > link

Link

Kwangjun Ahn · Xiang Cheng · Minhak Song · Chulhee Yun · Ali Jadbabaie · Suvrit Sra 🔗

-

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study ( Poster ) > link

Link

Prin Phunyaphibarn · Junghyun Lee · Bohan Wang · Huishuai Zhang · Chulhee Yun 🔗

-

Feature Learning in Infinite-Depth Neural Networks ( Poster ) > link

Link

Greg Yang · Dingli Yu · Chen Zhu · Soufiane Hayou 🔗

-

Variational Classification ( Poster ) > link

Link

Shehzaad Dhuliawala · Mrinmaya Sachan · Carl Allen 🔗

-

Implicit biases in multitask and continual learningfrom a backward error analysis perspective ( Poster ) > link

Link

Benoit Dherin 🔗

-

Spectrum Extraction and Clipping for Implicitly Linear Layers ( Poster ) > link

Link

Ali Ebrahimpour-Boroojeny · Matus Telgarsky · Hari Sundaram 🔗

-

The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization ( Poster ) > link

Link

Mingze Wang · Lei Wu 🔗

-

Curvature-Dimension Tradeoff for Generalization in Hyperbolic Space ( Poster ) > link

Link

Nicolás Alvarado · Hans Lobel · Mircea Petrache 🔗

-

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations ( Poster ) > link

Link

GuanWen Qiu · Da Kuang · Surbhi Goel 🔗

-

Unveiling the Hessian's Connection to the Decision Boundary ( Poster ) > link

Link

Mahalakshmi Sabanayagam · Freya Behrens · Urte Adomaityte · Anna Dawid 🔗

-

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks ( Poster ) > link

Link

Zixuan Zhang · Kaiqi Zhang · Minshuo Chen · Yuma Takeda · Mengdi Wang · Tuo Zhao · Yu-Xiang Wang 🔗

-

Large Learning Rates Improve Generalization: But How Large Are We Talking About? ( Poster ) > link

Link

Ekaterina Lobacheva · Eduard Pokonechny · Maxim Kodryan · Dmitry Vetrov 🔗

-

Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization ( Poster ) > link

Link

Atli Kosson · Dongyang Fan · Martin Jaggi 🔗

-

Generalization Guarantees of Deep ResNets in the Mean-Field Regime ( Poster ) > link

Link

Yihang Chen · Fanghui Liu · Yiping Lu · Grigorios Chrysos · Volkan Cevher 🔗

-

Theoretical Explanation for Generalization from Adversarial Perturbations ( Poster ) > link

Link

Soichiro Kumano · Hiroshi Kera · Toshihiko Yamasaki 🔗

-

In-Context Convergence of Transformers ( Poster ) > link

Link

Yu Huang · Yuan Cheng · Yingbin Liang 🔗

-

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time ( Poster ) > link

Link

Yatin Dandi · Florent Krzakala · Bruno Loureiro · Luca Pesce · Ludovic Stephan 🔗

-

Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States ( Poster ) > link

Link

Ziqiao Wang · Yongyi Mao 🔗

-

Unraveling the Complexities of Simplicity Bias: Mitigating and Amplifying Factors ( Poster ) > link

Link

Xuchen Gong · Tianwen Fu 🔗

-

Transformers as Support Vector Machines ( Poster ) > link

Link

Davoud Ataee Tarzanagh · Yingcong Li · Christos Thrampoulidis · Samet Oymak 🔗

-

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems ( Poster ) > link

Link

Juno Kim · Kakei Yamamoto · Kazusato Oko · Zhuoran Yang · Taiji Suzuki 🔗

-

A Theoretical Study of Dataset Distillation ( Poster ) > link

Link

Zachary Izzo · James Zou 🔗

-

Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models ( Poster ) > link

Link

Deqing Fu · Tian-qi Chen · Robin Jia · Vatsal Sharan 🔗

-

Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty ( Poster ) > link

Link

Kajetan Schweighofer · Lukas Aichberger · Mykyta Ielanskyi · Sepp Hochreiter 🔗

-

In-Context Learning on Unstructured Data: Softmax Attention as a Mixture of Experts ( Poster ) > link

Link

Kevin Christian Wibisono · Yixin Wang 🔗

-

Attention-Only Transformers and Implementing MLPs with Attention Heads ( Poster ) > link

Link

Robert Huben · Valerie Morris 🔗

-

Privacy at Interpolation: Precise Analysis for Random and NTK Features ( Poster ) > link

Link

Simone Bombari · Marco Mondelli 🔗

-

Denoising Low-Rank Data Under Distribution Shift: Double Descent and Data Augmentation ( Poster ) > link

Link

Chinmaya Kausik · Kashvi Srivastava · Rishi Sonthalia 🔗

-

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks ( Poster ) > link

Link

Behrad Moniri · Donghwan Lee · Hamed Hassani · Edgar Dobriban 🔗

-

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data ( Poster ) > link

Link

Zhiwei Xu · Yutong Wang · Spencer Frei · Gal Vardi · Wei Hu 🔗

-

How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks ( Poster ) > link

Link

Mo Zhou · Rong Ge 🔗

-

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP ( Poster ) > link

Link

Zixiang Chen · Yihe Deng · Yuanzhi Li · Quanquan Gu 🔗

-

Provably Efficient CVaR RL in Low-rank MDPs ( Poster ) > link

Link

Yulai Zhao · Wenhao Zhan · Xiaoyan Hu · Ho-fung Leung · Farzan Farnia · Wen Sun · Jason Lee 🔗

-

Analysis of Task Transferability in Large Pre-trained Classifiers ( Poster ) > link

Link

Akshay Mehra · Yunbei Zhang · Jihun Hamm 🔗

-

On Scale-Invariant Sharpness Measures ( Poster ) > link

Link

Behrooz Tahmasebi · Ashkan Soleymani · Stefanie Jegelka · Patrick Jaillet 🔗

-

Gibbs-Based Information Criteria and the Over-Parameterized Regime ( Poster ) > link

Link

Haobo Chen · Yuheng Bu · Gregory Wornell 🔗

-

Grokking modular arithmetic can be explained by margin maximization ( Poster ) > link

Link

Mohamad Amin Mohamadi · Zhiyuan Li · Lei Wu · Danica J. Sutherland 🔗

-

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning ( Poster ) > link

Link

Fadhel Ayed · Francois Caron · Paul Jung · Juho Lee · Hoil Lee · Hongseok Yang 🔗

-

On the Computational Complexity of Inverting Generative Models ( Poster ) > link

Link

Feyza Duman Keles · Chinmay Hegde 🔗

-

Flow-Based High-Dimensionally Distributional Robust Optimization ( Poster ) > link

Link

Chen Xu · Jonghyeok Lee · Xiuyuan Cheng · Yao Xie 🔗

-

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining ( Poster ) > link

Link

Licong Lin · Yu Bai · Song Mei 🔗

-

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations ( Poster ) > link

Link

Tianyu Guo · Wei Hu · Song Mei · Huan Wang · Caiming Xiong · Silvio Savarese · Yu Bai 🔗

-

A Theoretical Explanation of Deep RL Performance in Stochastic Environments ( Poster ) > link

Link

Cassidy Laidlaw · Banghua Zhu · Stuart J Russell · Anca Dragan 🔗

-

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models ( Poster ) > link

Link

Song Mei · Yuchen Wu 🔗

-

Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line ( Poster ) > link

Link

Rishi Sonthalia · Xinyue (Serena) Li · Bochao Gu 🔗

-

Continual Learning for Long-Tailed Recognition: Bridging the Gap in Theory and Practice ( Poster ) > link

Link

Mahdiyar Molahasani · Ali Etemad · Michael Greenspan 🔗

-

SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning ( Poster ) > link

Link

Alice Bizeul · Carl Allen 🔗

-

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks ( Poster ) > link

Link

Atli Kosson · Bettina Messmer · Martin Jaggi 🔗

-

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rate ( Poster ) > link

Link

Miao Lu · Beining Wu · Xiaodong Yang · Difan Zou 🔗

-

On Compositionality and Emergence in Physical Systems Generativie Modeling ( Poster ) > link

Link

Justin Diamond 🔗

-

Escaping Random Teacher Initialization Enhances Signal Propagation and Representations ( Poster ) > link

Link

Felix Sarnthein · Sidak Pal Singh · Antonio Orvieto · Thomas Hofmann 🔗

-

The Expressive Power of Transformers with Chain of Thought ( Poster ) > link

Link

William Merrill · Ashish Sabharwal 🔗

-

Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning ( Poster ) > link

Link

Hongkang Li · Meng Wang · Songtao Lu · Hui Wan · Xiaodong Cui · Pin-Yu Chen 🔗

-

Fit Like You Sample: Sample-Efficient Score Matching From Fast Mixing Diffusions ( Poster ) > link

Link

Yilong Qin · Andrej Risteski 🔗

-

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains ( Poster ) > link

Link

Qingyue Zhao · Banghua Zhu 🔗

-

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization ( Poster ) > link

Link

Elan Rosenfeld · Andrej Risteski 🔗

-

MoXCo:How I learned to stop exploring and love my local minima? ( Poster ) > link

Link

Esha Singh · Shoham Sabach · Yu-Xiang Wang 🔗

-

First-order ANIL provably learns representations despite overparametrisation ( Poster ) > link

Link

Oguz Kaan Yuksel · Etienne Boursier · Nicolas Flammarion 🔗

-

A Data-Driven Measure of Relative Uncertainty for Misclassification Detection ( Poster ) > link

Link

Eduardo Dadalto Câmara Gomes · Marco Romanelli · Georg Pichler · Pablo Piantanida 🔗

-

Non-Vacuous Generalization Bounds for Large Language Models ( Poster ) > link

Link

Sanae Lotfi · Marc Finzi · Yilun Kuang · Tim G. J. Rudner · Micah Goldblum · Andrew Wilson 🔗

-

Learning from setbacks: the impact of adversarial initialization on generalization performance ( Poster ) > link

Link

Yatin Dandi · Stefani Karp · Francesca Mignacco · Kavya Ravichandran 🔗

-

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit ( Poster ) > link

Link

Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan 🔗

-

Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo ( Poster ) > link

Link

Szilvia Ujváry · Gergely Flamich · Vincent Fortuin · José Miguel Hernández-Lobato 🔗

-

Divergence at the Interpolation Threshold: Identifying, Interpreting \& Ablating the Sources of a Deep Learning Puzzle ( Poster ) > link

Link

Rylan Schaeffer · Zachary Robertson · Akhilan Boopathy · Mikail Khona · Ila Fiete · Andrey Gromov · Sanmi Koyejo 🔗

-

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult ( Poster ) > link

Link

Yuqing Wang · Zhenghao Xu · Tuo Zhao · Molei Tao 🔗

-

Toward Student-oriented Teacher Network Training for Knowledge Distillation ( Poster ) > link

Link

Chengyu Dong · Liyuan Liu · Jingbo Shang 🔗

-

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks ( Poster ) > link

Link

Anna Bair · Hongxu Yin · Maying Shen · Pavlo Molchanov · Jose M. Alvarez 🔗

-

Invariant Low-Dimensional Subspaces in Gradient Descent for Learning Deep Matrix Factorizations ( Poster ) > link

Link

Can Yaras · Peng Wang · Wei Hu · Zhihui Zhu · Laura Balzano · Qing Qu 🔗

-

How Structured Data Guides Feature Learning: A Case Study of the Parity Problem ( Poster ) > link

Link

Atsushi Nitanda · Kazusato Oko · Taiji Suzuki · Denny Wu 🔗

-

The Next Symbol Prediction Problem: PAC-learning and its relation to Language Models ( Poster ) > link

Link

Satwik Bhattamishra · Phil Blunsom · Varun Kanade 🔗

-

Why Do We Need Weight Decay for Overparameterized Deep Networks? ( Poster ) > link

Link

Maksym Andriushchenko · Francesco D'Angelo · Aditya Vardhan Varre · Nicolas Flammarion 🔗

-

The Double-Edged Sword: Perception and Uncertainty in Inverse Problems ( Poster ) > link

Link

Regev Cohen · Ehud Rivlin · Daniel Freedman 🔗

-

Near-Interpolators: Fast Norm Growth and Tempered Near-Overfitting ( Poster ) > link

Link

Yutong Wang · Rishi Sonthalia · Wei Hu 🔗

-

On robust overfitting: adversarial training induced distribution matters ( Poster ) > link

Link

Runzhi Tian · Yongyi Mao 🔗

-

Are Graph Neural Networks Optimal Approximation Algorithms? ( Poster ) > link

Link

Morris Yau · Eric Lu · Nikolaos Karalias · Jessica Xu · Stefanie Jegelka 🔗

-

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention ( Poster ) > link

Link

Yuandong Tian · Yiping Wang · Zhenyu Zhang · Beidi Chen · Simon Du 🔗

Main Navigation

Workshop

Mathematics of Modern Machine Learning (M3L)

Zhiyuan Li · Tengyu Ma · Surbhi Goel · Kaifeng Lyu · Christina Baek · Bingbin Liu · Alex Damian · Aditi Raghunathan

Room 242

Schedule