Workshop
Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization
Julia Gusak 路 Jean Kossaifi 路 Alena Shilova 路 Rocco Sedona 路 Cristiana Bentes 路 Animashree Anandkumar 路 Olivier Beaumont
Room 243 - 245
Sat 16 Dec, 6:15 a.m. PST
Unlock neural network training's potential for good and science! Enhance computational efficiency, scalability, and resource optimization. Join HPC and AI experts to tackle challenges in theory and applications.
Chat is not available.
Timezone: America/Los_Angeles
Schedule
Sat 6:15 a.m. - 6:50 a.m.
|
Poster Placement
|
馃敆 |
Sat 6:50 a.m. - 7:00 a.m.
|
Opening Remarks
(
Talk
)
>
SlidesLive Video |
Julia Gusak 馃敆 |
Sat 7:00 a.m. - 7:30 a.m.
|
A Data-Centric View on Workflows that Couple HPC with Large-Scale Models
(
Invited Talk
)
>
SlidesLive Video |
Ana Gainaru 馃敆 |
Sat 7:30 a.m. - 8:00 a.m.
|
Rematerialization Algorithms for Memory-efficient Learning
(
Invited Talk
)
>
SlidesLive Video |
Lionel Eyraud-Dubois 馃敆 |
Sat 8:00 a.m. - 8:30 a.m.
|
Coffee Break
|
馃敆 |
Sat 8:30 a.m. - 9:00 a.m.
|
Navigating the Landscape of Enormous AI Model Training
(
Invited Talk
)
>
SlidesLive Video |
Yang You 馃敆 |
Sat 9:00 a.m. - 9:30 a.m.
|
Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models
(
Invited Talk
)
>
SlidesLive Video |
Olatunji Ruwase 馃敆 |
Sat 9:30 a.m. - 10:00 a.m.
|
Contributed Talks
(
Talk
)
>
link
SlidesLive Video |
馃敆 |
Sat 9:31 a.m. - 9:36 a.m.
|
Training and inference of large language models using 8-bit floating point ( Contributed Talk & Poster ) > link | Sergio Perez 路 Yan Zhang 路 James Briggs 路 Charles Blake 路 Josh Levy-Kramer 路 Paul Balanca 路 Carlo Luschi 路 Stephen Barlow 路 Andrew Fitzgibbon 馃敆 |
Sat 9:37 a.m. - 9:42 a.m.
|
MatFormer: Nested Transformer for Elastic Inference ( Contributed Talk & Poster ) > link |
11 presentersFnu Devvrit 路 Sneha Kudugunta 路 Aditya Kusupati 路 Tim Dettmers 路 Kaifeng Chen 路 Inderjit Dhillon 路 Yulia Tsvetkov 路 Hannaneh Hajishirzi 路 Sham Kakade 路 Ali Farhadi 路 Prateek Jain |
Sat 9:43 a.m. - 9:48 a.m.
|
Sparse Backpropagation for MoE Training ( Contributed Talk & Poster ) > link | Liyuan Liu 路 Jianfeng Gao 路 Weizhu Chen 馃敆 |
Sat 9:49 a.m. - 9:54 a.m.
|
Efficient Parallelization Layouts for Large-Scale Distributed Model Training ( Contributed Talk & Poster ) > link | Johannes Hagemann 路 Samuel Weinbach 路 Konstantin Dobler 路 Maximilian Schall 路 Gerard de Melo 馃敆 |
Sat 9:55 a.m. - 10:00 a.m.
|
CoTFormer: More Tokens With Attention Make Up For Less Depth ( Contributed Talk & Poster ) > link | Amirkeivan Mohtashami 路 Matteo Pagliardini 路 Martin Jaggi 馃敆 |
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch
|
馃敆 |
Sat 11:30 a.m. - 12:00 p.m.
|
Poster Session ( Poster Session ) > link | 馃敆 |
Sat 12:00 p.m. - 12:30 p.m.
|
Crafting Computational Efficiency for Large Models: Training Recipes, Scaling Strategies and Sparsity Sorcery with Specialized Hardware
(
Invited Talk
)
>
SlidesLive Video |
Natalia Vassilieva 馃敆 |
Sat 12:30 p.m. - 1:00 p.m.
|
Invited Talk by Databricks
(
Invited Talk
)
>
SlidesLive Video |
馃敆 |
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee Break
|
馃敆 |
Sat 1:30 p.m. - 2:00 p.m.
|
Efficient LLM Training and Inference on GPUs
(
Invited Talk
)
>
SlidesLive Video |
Mohammad Shoeybi 路 Bryan Catanzaro 馃敆 |
Sat 2:00 p.m. - 2:50 p.m.
|
Panel Discussion
(
Panel
)
>
SlidesLive Video |
Yang You 路 Olatunji Ruwase 路 Natalia Vassilieva 路 Mohammad Shoeybi 路 Ana Gainaru 路 Lionel Eyraud-Dubois 路 Jean Kossaifi 馃敆 |
Sat 2:50 p.m. - 3:00 p.m.
|
Closing Remarks
(
Talk
)
>
SlidesLive Video |
Jean Kossaifi 馃敆 |
Sat 3:00 p.m. - 3:30 p.m.
|
Poster Session ( Poster Session ) > link | 馃敆 |
-
|
AI4HPC: Library to Train AI Models on HPC Systems using CFD Datasets
(
Poster
)
>
link
SlidesLive Video |
Eray Inanc 路 Rakesh Sarma 路 Marcel Aach 路 Rocco Sedona 路 Andreas Lintermann 馃敆 |
-
|
Efficient and Approximate Per-Example Gradient Norms for Gradient Noise Scale ( Poster ) > link | Gavia Gray 路 Anshul Samar 路 Joel Hestness 馃敆 |
-
|
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
(
Poster
)
>
link
SlidesLive Video |
Yaniv Blumenfeld 路 Itay Hubara 路 Daniel Soudry 馃敆 |
-
|
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
(
Poster
)
>
link
SlidesLive Video |
Divyang Doshi 路 Jung-Eun Kim 馃敆 |
-
|
Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search ( Poster ) > link | Lei Chen 馃敆 |
-
|
Remaining-Useful-Life Prediction and Uncertainty Quantification using LSTM Ensembles for Aircraft Engines
(
Poster
)
>
|
Oishi Deb 路 Emmanouil Benetos 路 Philip Torr 馃敆 |
-
|
LightSeq: : Sequence Level Parallelism for Distributed Training of Long Context Transformers ( Poster ) > link | Dacheng Li 路 Rulin Shao 路 Anze Xie 路 Eric Xing 路 Joseph Gonzalez 路 Ion Stoica 路 Xuezhe Ma 路 Hao Zhang 馃敆 |
-
|
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments
(
Poster
)
>
link
SlidesLive Video |
Mert Unsal 路 Ali Maatouk 路 Antonio De Domenico 路 Nicola Piovesan 路 Fadhel Ayed 馃敆 |
-
|
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
(
Poster
)
>
link
SlidesLive Video |
Sarin Chandy 路 Varun Prashant Gangal 路 Yi Yang 路 Gabriel Maggiotti 馃敆 |
-
|
Improving Deep Ensembles without Communication ( Poster ) > link | Konstantinos Pitas 路 Michael Arbel 路 Julyan Arbel 馃敆 |
-
|
ConcatPlexer : Additional Dim1 Batching for Faster ViTs
(
Contributed Talk & Poster
)
>
link
SlidesLive Video |
Donghoon Han 路 Seunghyeon Seo 路 Donghyeon Jeon 路 Jiho Jang 路 Chaerin Kong 路 Nojun Kwak 馃敆 |
-
|
InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning ( Poster ) > link | Sharath Nittur Sridhar 路 Souvik Kundu 路 Sairam Sundaresan 路 Maciej Szankin 路 Anthony Sarah 馃敆 |
-
|
ReLoRA: High-Rank Training Through Low-Rank Updates ( Poster ) > link | Vladislav Lialin 路 Sherin Muckatira 路 Namrata Shivagunde 路 Anna Rumshisky 馃敆 |
-
|
Sparse Iso-FLOP Transformations for Maximizing Training Efficiency ( Poster ) > link | Vithursan Thangarasa 路 Shreyas Saxena 路 Abhay Gupta 路 Sean Lie 馃敆 |
-
|
Embarrassingly Simple Dataset Distillation ( Poster ) > link | Yunzhen Feng 路 Shanmukha Ramakrishna Vedantam 路 Julia Kempe 馃敆 |
-
|
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs ( Poster ) > link | Suyu Ge 路 Yunan Zhang 路 Liyuan Liu 路 Minjia Zhang 路 Jiawei Han 路 Jianfeng Gao 馃敆 |
-
|
A Quadratic Synchronization Rule for Distributed Deep Learning ( Poster ) > link | Xinran Gu 路 Kaifeng Lyu 路 Sanjeev Arora 路 Jingzhao Zhang 路 Longbo Huang 馃敆 |
-
|
DAREL: Data Reduction with Losses for Training Acceleration of Real and Hypercomplex Neural Networks
(
Poster
)
>
link
SlidesLive Video |
Alexander Demidovskij 路 Aleksei Trutnev 路 Artyom Tugaryov 路 Igor Salnikov 路 Stanislav Pavlov 馃敆 |
-
|
Accelerating Deep Learning using Ivy
(
Poster
)
>
link
SlidesLive Video |
Guillermo Sanchez-Brizuela 路 Ved Patwardhan 路 Matthew Barrett 路 Paul Anderson 路 Mustafa Hani 路 Daniel Lenton 馃敆 |
-
|
Something for (almost) nothing: improving deep ensemble calibration using unlabeled data
(
Poster
)
>
link
SlidesLive Video |
Konstantinos Pitas 路 Julyan Arbel 馃敆 |
-
|
LeanFlex-GKP: Advancing Hassle-Free Structured Pruning with Simple Flexible Group Count ( Poster ) > link | Jiamu Zhang 路 Shaochen (Henry) Zhong 路 Andrew Ye 路 Zirui Liu 路 Kaixiong Zhou 路 Xia Hu 路 Shuai Xu 路 Vipin Chaudhary 馃敆 |
-
|
Patch Gradient Descent: Training Neural Networks on Very Large Images
(
Poster
)
>
link
SlidesLive Video |
Deepak Gupta 路 Gowreesh Mago 路 Arnav Chavan 路 Dilip K. Prasad 路 Rajat Thomas 馃敆 |
-
|
Batched Low-Rank Adaptation of Foundation Models ( Poster ) > link | Yeming Wen 路 Swarat Chaudhuri 馃敆 |
-
|
Local LoRA: Memory-Efficient Fine-Tuning of Large Language Models ( Poster ) > link | Oscar Key 路 Jean Kaddour 路 Pasquale Minervini 馃敆 |
-
|
Early Weight Averaging meets High Learning Rates for LLM Pre-training ( Poster ) > link | Sunny Sanyal 路 Atula Neerkaje 路 Jean Kaddour 路 Abhishek Kumar 路 Sujay Sanghavi 馃敆 |
-
|
Bandit-Driven Batch Selection for Robust Learning under Label Noise
(
Poster
)
>
link
SlidesLive Video |
Michal Lisicki 路 Graham Taylor 路 Mihai Nica 馃敆 |
-
|
Maestro: Uncovering Low-Rank Structures via Trainable Decomposition ( Poster ) > link | Samuel Horv谩th 路 Stefanos Laskaridis 路 Shashank Rajput 路 Hongyi Wang 馃敆 |
-
|
Tiny Graph Convolutional Networks with Topologically Consistent Magnitude Pruning ( Poster ) > link | Hichem SAHBI 馃敆 |
-
|
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency ( Poster ) > link | azhar shaikh 路 Michael Cochez 路 Denis Diachkov 路 Michiel de Rijcke 路 Sahar Yousefi 馃敆 |
-
|
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning ( Poster ) > link | Mengzhou Xia 路 Tianyu Gao 路 Zhiyuan Zeng 路 Danqi Chen 馃敆 |
-
|
A foundation for exact binarized morphological neural networks ( Poster ) > link | Theodore Aouad 路 Hugues Talbot 馃敆 |
-
|
Training Bayesian Neural Networks with Sparse Subspace Variational Inference ( Poster ) > link | Junbo Li 路 Zichen Miao 路 Qiang Qiu 路 Ruqi Zhang 馃敆 |
-
|
Task Arithmetic with LoRA for Continual Learning ( Poster ) > link | Rajas Chitale 路 Ankit Vaidya 路 Aditya Kane 路 Archana Ghotkar 馃敆 |
-
|
Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning ( Poster ) > link | Colin Bellinger 路 Mark Crowley 路 Isaac Tamblyn 馃敆 |
-
|
Cooperative Learning for Cost-Adaptive Inference ( Poster ) > link | Xingli Fang 路 Richard Bradford 路 Jung-Eun Kim 馃敆 |
-
|
Generalisable Agents for Neural Network Optimisation ( Poster ) > link | Kale-ab Tessera 路 Callum R. Tilbury 路 Sasha Abramowitz 路 Ruan John de Kock 路 Omayma Mahjoub 路 Benjamin Rosman 路 Sara Hooker 路 Arnu Pretorius 馃敆 |