Poster
Efficient Second-Order Online Kernel Learning with Adaptive Embedding
Daniele Calandriello · Alessandro Lazaric · Michal Valko

Wed Dec 6th 06:30 -- 10:30 PM @ Pacific Ballroom #50 #None
Online kernel learning (OKL) is a flexible framework to approach prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces can contain an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate $\O(\sqrt{T})$ more loss than the optimal function, but the curse of kernelization results in a $\O(t)$ per step complexity. Second-order methods get closer to the optimum much faster, suffering only $\O(\log(T))$ regret, but second-order updates are even more expensive, with a $\O(t^2)$ per-step cost. Existing approximate OKL methods try to reduce this complexity either by limiting the Support Vectors (SV) introduced in the predictor, or by avoiding the kernelization process altogether using embedding. Nonetheless, as long as the size of the approximation space or the number of SV does not grow over time, an adversary can always exploit the approximation process. In this paper, we propose PROS-N-KONS, a method that combines Nystrom sketching to project the input point in a small, accurate embedded space, and performs efficient second-order updates in this space. The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$. Moreover, the second-order updated allows us to achieve the logarithmic regret. We empirically compare our algorithm on recent large-scales benchmarks and show it performs favorably.

#### Author Information

##### Michal Valko (DeepMind Paris and Inria Lille - Nord Europe)

Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as semi-supervised learning, bandit algorithms, and anomaly detection. The common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.