Timezone: »

Workshop
Learning Faster From Easy Data
Peter Grünwald · Wouter M Koolen · Sasha Rakhlin · Nati Srebro · Alekh Agarwal · Karthik Sridharan · Tim van Erven · Sebastien Bubeck

Tue Dec 10 07:30 AM -- 06:30 PM (PST) @ Harrah's Tahoe A

Most existing theory in both online and statistical learning is centered around a worst-case analysis. For instance, in online learning data are assumed to be generated by an adversary and the goal is to minimize regret. In statistical learning the majority of theoretical results consider risk bounds for the worst-case i.i.d. data generating distribution. In both cases the worst case convergence rates (for regret/n and risk) for 0/1-type and absolute loss functions are O(1/sqrt{n}). Yet in practice simple heuristics like Follow-the-Leader (FTL) often empirically exhibit faster rates.

It has long been known that under Vovk's (1990) mixability condition on the loss function, faster rates are possible. Even without mixability or the closely related exp-concavity (Cesa-Bianchi and Lugosi 2006), in the statistical setting there exist conditions on the distribution under which faster learning rates can be obtained; the main example being Tsybakov's (2004) margin condition, which was recently shown to be intimately connected to mixability (Van Erven et al., 2012).

In practice, even if the loss is not mixable and no distributional assumptions apply, the data are nevertheless often easy enough to allow accelerated learning. Initial promising steps in this direction have been made recently, including parameterless algorithms that combine worst-case O(1/sqrt{n}) regret guarantees for the adversarial setting with
- fast rates in the stochastic bandit setting (Bubeck and Slivkins, COLT 2012)
- exploitation of observably sub-adversarial data (Rakhlin, Shamir and Sridharan, AISTATS 2013)
- learning as fast as FTL whenever FTL works well (De Rooij, Van Erven, Grünwald and Koolen, JMLR 2013)

It remains a huge challenge however, to characterize the types of data for which faster learning is possible, to define easy data' in a generic way, let alone to design algorithms that automatically adapt to exploit it.

The aim of this day-long workshop is threefold

1) to map, by means of a series of invited and contributed talks, the existing landscape of "easiness criteria" in relation to the efficiency of their corresponding algorithms,

2) to identify, by means of a panel discussion led by the organizers, obstacles and promising directions,

3) and through interaction foster partnerships for future research.

Discussion will be centered around the so-far elusive concept of easy data. Can the existing characterizations based on variances, mixability gaps, FTL etc. be brought under a common umbrella? Can ideas and approaches from statistical learning theory be transported to online learning (and vice versa)?