Skip to yearly menu bar Skip to main content


Poster

Online Robust Regression via SGD on the l1 loss

Scott Pesme ยท Nicolas Flammarion

Poster Session 3 #821

Abstract: We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner, one data point after the other. More specifically, for a true parameter ฮธโˆ—ฮธโˆ—, we consider the corrupted Gaussian linear model $y = + \varepsilon + bwheretheadversarialnoisewheretheadversarialnoisebcantakeanyvaluewithprobabilitycantakeanyvaluewithprobability\etaandequalszerootherwise.Weconsiderthisadversarytobeoblivious(i.e.,andequalszerootherwise.Weconsiderthisadversarytobeoblivious(i.e.,bindependentofthedata)sincethisistheonlycontaminationmodelunderwhichconsistencyispossible.Currentalgorithmsrelyonhavingthewholedataathandinordertoidentifyandremovetheoutliers.Incontrast,weshowinthisworkthatstochasticgradientdescentonthel1lossconvergestothetrueparametervectorataindependentofthedata)sincethisistheonlycontaminationmodelunderwhichconsistencyispossible.Currentalgorithmsrelyonhavingthewholedataathandinordertoidentifyandremovetheoutliers.Incontrast,weshowinthisworkthatstochasticgradientdescentonthel1lossconvergestothetrueparametervectorata\tilde{O}( 1 / (1 - \eta)^2 n )$ rate which is independent of the values of the contaminated measurements. Our proof relies on the elegant smoothing of the l1 loss by the Gaussian data and a classical non-asymptotic analysis of Polyak-Ruppert averaged SGD. In addition, we provide experimental evidence of the efficiency of this simple and highly scalable algorithm.

Chat is not available.