Skip to yearly menu bar Skip to main content


Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradient and AdamW

Di Zhang · Yihang Zhang · Suvrajeet Sen

Abstract

Chat is not available.