NeurIPS Poster On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Poster

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

Denny Wu · Ji Xu

Poster Session 6 #1726

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract: We consider the linear model

\vy = \vX \vbeta_{⋆} + \vepsilon

$\vy=\vX\vbeta_{\star}+\vepsilon$ with

\vX \in R^{n \times p}

$\vX\in \mathbb{R}^{n\times p}$ in the overparameterized regime

p > n

$p>n$ . We estimate

\vbeta_{⋆}

$\vbeta_{\star}$ via generalized (weighted) ridge regression:

{\hat{\vbeta}}_{λ} = {(\vX^{\t} \vX + λ \vSigma_{w})}^{†} \vX^{\t} \vy

$\hat{\vbeta}_{\lambda}=\left(\vX^{\t}\vX+\lambda\vSigma_w\right)^{\dagger}\vX^{\t}\vy$ , where

\vSigma_{w}

$\vSigma_w$ is the weighting matrix. Under a random design setting with general data covariance

\vSigma_{x}

$\vSigma_x$ and anisotropic prior on the true coefficients

\bbE \vbeta_{⋆} \vbeta_{⋆}^{\t} = \vSigma_{β}

$\bbE\vbeta_{\star}\vbeta_{\star}^{\t}=\vSigma_\beta$ , we provide an exact characterization of the prediction risk

E (y - \vx^{\t} {\hat{\vbeta}}_{λ})^{2}

$\mathbb{E}(y-\vx^{\t}\hat{\vbeta}_{\lambda})^2$ in the proportional asymptotic limit

p / n \to γ \in (1, \infty)

$p/n\rightarrow \gamma \in (1,\infty)$ . Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting

λ_{\opt}

$\lambda_{\opt}$ for the ridge parameter

λ

$\lambda$ , which suggests an implicit

ℓ_{2}

$\ell_2$ regularization effect of overparameterization, and theoretically justifies the surprising empirical observation that

λ_{\opt}

$\lambda_{\opt}$ can be \textit{negative} in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when

\vX

$\vX$ and

\vbeta_{⋆}

$\vbeta_{\star}$ are non-isotropic. Finally, we determine the optimal

\vSigma_{w}

$\vSigma_w$ for both the ridgeless (

λ \to 0

$\lambda\to 0$ ) and optimally regularized (

λ = λ_{\opt}

$\lambda = \lambda_{\opt}$ ) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

Chat is not available.

Poster

On the Optimal Weighted ℓ2ℓ2\ell_2 Regularization in Overparameterized Linear Regression

Denny Wu · Ji Xu

Poster Session 6 #1726

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression