Timezone: »
Poster
On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression
Denny Wu · Ji Xu
We consider the linear model $\vy=\vX\vbeta_{\star}+\vepsilon$ with $\vX\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$. We estimate $\vbeta_{\star}$ via generalized (weighted) ridge regression: $\hat{\vbeta}_{\lambda}=\left(\vX^{\t}\vX+\lambda\vSigma_w\right)^{\dagger}\vX^{\t}\vy$, where $\vSigma_w$ is the weighting matrix. Under a random design setting with general data covariance $\vSigma_x$ and anisotropic prior on the true coefficients $\bbE\vbeta_{\star}\vbeta_{\star}^{\t}=\vSigma_\beta$, we provide an exact characterization of the prediction risk $\mathbb{E}(y-\vx^{\t}\hat{\vbeta}_{\lambda})^2$ in the proportional asymptotic limit $p/n\rightarrow \gamma \in (1,\infty)$. Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $\lambda_{\opt}$ for the ridge parameter $\lambda$, which suggests an implicit $\ell_2$ regularization effect of overparameterization, and theoretically justifies the surprising empirical observation that $\lambda_{\opt}$ can be \textit{negative} in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when $\vX$ and $\vbeta_{\star}$ are non-isotropic. Finally, we determine the optimal $\vSigma_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
Author Information
Denny Wu (University of Toronto & Vector Institute)
Ji Xu (Columbia University)
More from the Same Authors
-
2022 Poster: High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation »
Jimmy Ba · Murat Erdogdu · Taiji Suzuki · Zhichao Wang · Denny Wu · Greg Yang -
2022 Poster: Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime »
Naoki Nishikawa · Taiji Suzuki · Atsushi Nitanda · Denny Wu -
2021 Poster: Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis »
Atsushi Nitanda · Denny Wu · Taiji Suzuki -
2020 : Poster Session 3 (gather.town) »
Denny Wu · Chengrun Yang · Tolga Ergen · sanae lotfi · Charles Guille-Escuret · Boris Ginsburg · Hanbake Lyu · Cong Xie · David Newton · Debraj Basu · Yewen Wang · James Lucas · MAOJIA LI · Lijun Ding · Jose Javier Gonzalez Ortiz · Reyhane Askari Hemmat · Zhiqi Bu · Neal Lawton · Kiran Thekumparampil · Jiaming Liang · Lindon Roberts · Jingyi Zhu · Dongruo Zhou -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Contributed Video: When Does Preconditioning Help or Hurt Generalization?, Denny Wu »
Denny Wu -
2019 Poster: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2019 Spotlight: Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond »
Xuechen (Chen) Li · Denny Wu · Lester Mackey · Murat Erdogdu -
2018 : Poster Session I »
Aniruddh Raghu · Daniel Jarrett · Kathleen Lewis · Elias Chaibub Neto · Nicholas Mastronarde · Shazia Akbar · Chun-Hung Chao · Henghui Zhu · Seth Stafford · Luna Zhang · Jen-Tang Lu · Changhee Lee · Adityanarayanan Radhakrishnan · Fabian Falck · Liyue Shen · Daniel Neil · Yusuf Roohani · Aparna Balagopalan · Brett Marinelli · Hagai Rossman · Sven Giesselbach · Jose Javier Gonzalez Ortiz · Edward De Brouwer · Byung-Hoon Kim · Rafid Mahmood · Tzu Ming Hsu · Antonio Ribeiro · Rumi Chunara · Agni Orfanoudaki · Kristen Severson · Mingjie Mai · Sonali Parbhoo · Albert Haque · Viraj Prabhu · Di Jin · Alena Harley · Geoffroy Dubourg-Felonneau · Xiaodan Hu · Maithra Raghu · Jonathan Warrell · Nelson Johansen · Wenyuan Li · Marko Järvenpää · Satya Narayan Shukla · Sarah Tan · Vincent Fortuin · Beau Norgeot · Yi-Te Hsu · Joel H Saltz · Veronica Tozzo · Andrew Miller · Guillaume Ausset · Azin Asgarian · Francesco Paolo Casale · Antoine Neuraz · Bhanu Pratap Singh Rawat · Turgay Ayer · Xinyu Li · Mehul Motani · Nathaniel Braman · Laetitia M Shao · Adrian Dalca · Hyunkwang Lee · Emma Pierson · Sandesh Ghimire · Yuji Kawai · Owen Lahav · Anna Goldenberg · Denny Wu · Pavitra Krishnaswamy · Colin Pawlowski · Arijit Ukil · Yuhui Zhang -
2018 Poster: Benefits of over-parameterization with EM »
Ji Xu · Daniel Hsu · Arian Maleki -
2016 Poster: Global Analysis of Expectation Maximization for Mixtures of Two Gaussians »
Ji Xu · Daniel Hsu · Arian Maleki -
2016 Oral: Global Analysis of Expectation Maximization for Mixtures of Two Gaussians »
Ji Xu · Daniel Hsu · Arian Maleki