Skip to yearly menu bar Skip to main content


Session

Session 3: Theory and Sequential Decision Making

Sanjoy Dasgupta

Abstract:
Chat is not available.

Tue 4 Dec. 16:00 - 16:20 PST

FilterBoost: Regression and Classification on Large Datasets

Joseph K Bradley · Robert E Schapire

We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm, which is based on a logistic regression technique proposed by Collins, Schapire, & Singer, represents the first boosting-by-filtering algorithm which is truly adaptive and does not need the less realistic assumptions required by previous work. Moreover, we give the first proof that the algorithm of Collins et al. is a strong PAC learner, albeit within the filtering setting. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification.

Tue 4 Dec. 16:20 - 16:40 PST

The Price of Bandit Information for Online Optimization

Varsha Dani · Thomas P Hayes · Sham M Kakade

In the online linear optimization problem, a learner must choose, in each round, a decision from a set $D \subset \R^n$ in order to minimize an (unknown and changing) linear cost function. We present sharp rates of convergence (with respect to additive regret) for both the full information setting (where the cost function is revealed at the end of each round) and in the bandit setting (where only the scalar cost incurred is revealed). In particular, this paper is concerned with the \emph{price of bandit information} --- how much worse the regret is in the bandit case as compared to the full information case. For the full information case, the upper bound on the regret is $O^*(\sqrt{nT})$, where $n$ is the ambient dimension and $T$ is the time horizon. For the bandit case, we present an algorithm which achieves $O^*(n^{3/2}\sqrt{T})$ regret --- all previous (nontrivial) bounds here were $O(\textrm{poly}(n)T^{2/3})$ or worse. It is striking that the convergence rate for the bandit setting is only a factor of $n$ worse than in the full information case --- in stark contrast to the $K$-arm bandit setting, where the gap in the dependence on $K$ is exponential ($\sqrt{TK}$ vs. $\sqrt{T\log K}$). We also present lower bounds showing that this gap is at least $\sqrt{n}$, which we conjecture to be the correct order. The bandit algorithm we present can be implemented efficiently in special cases of particular interest, such as path planning and Markov Decision Problems.

Tue 4 Dec. 16:40 - 17:00 PST

A Multiplicative Weights Algorithm for Apprenticeship Learning

Umar Syed · Robert E Schapire

We study the problem of an apprentice learning to behave in an environment with an unknown reward function by observing the behavior of an expert. We follow on the work of Abbeel and Ng who considered a framework in which the true reward function is assumed to be a linear combination of a set of known and observable features. We give a new algorithm that, like theirs, is guaranteed to learn a policy that is nearly as good as the expert's, given enough examples. However, unlike their algorithm, we show that ours may produce a policy that is substantially better than the expert's. Moreover, our algorithm is computationally much faster, is easy to implement, and can be applied even in the absence of an expert. The method is based on a game-theoretic view of the problem, which leads naturally to a direct application of the multiplicative-weights algorithm of Freund and Schapire for playing repeated matrix games. In addition to our formal presentation and analysis of the new algorithm, we sketch how the method can be applied when the MDP itself is unknown, and we provide an experimental demonstration of the algorithm on a toy video-game environment.

Tue 4 Dec. 17:00 - 17:20 PST

Cluster Stability for Finite Samples

Ohad Shamir · Naftali Tishby

Cluster stability has recently received growing attention as a cluster validation criterion in a sample-based framework. However, recent work [2] has shown that as the sample size increases to infinity, any clustering model will usually become asymptotically stable. This led to the conclusion that stability is lacking as a theoretical and practical tool. The discrepancy between this conclusion and the success of stability in practice has remained an open question, which we attempt to address. Our theoretical approach is that stability, as used by cluster validation algorithms, is similar in certain respects to measures of generalization in a model-selection framework. In such cases, the model chosen governs the convergence rate of generalization bounds. By arguing that these rates are more important than the sample size, we are led to the prediction that stability-based cluster validation algorithms should not degrade with increasing sample size, despite the asymptotic universal stability. This prediction is substantiated by a theoretical analysis as well as several empirical results. We conclude that stability remains a meaningful theoretical and practical criterion for cluster validity over finite samples.