Skip to yearly menu bar Skip to main content


Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradient and AdamW

Di Zhang ⋅ Yihang Zhang ⋅ Suvrajeet Sen

Abstract

Chat is not available.