Timezone: »

Computational Trade-offs in Statistical Learning
Alekh Agarwal · Sasha Rakhlin

Thu Dec 15 10:30 PM -- 11:00 AM (PST) @ Montebajo: Basketball Court
Event URL: https://sites.google.com/site/costnips/ »

Since its early days, the field of Machine Learning has focused on
developing computationally tractable algorithms with good learning
guarantees. The vast literature on statistical learning theory has led
to a good understanding of how the predictive performance of different
algorithms improves as a function of the number of training
samples. By the same token, the well-developed theories of
optimization and sampling methods have yielded efficient computational
techniques at the core of most modern learning methods. The separate
developements in these fields mean that given an algorithm we have a
sound understanding of its statistical and computational
beahvior. However, there hasn't been much joint study of the
computational and statistical complexities of learinng, as a
consequence of which, little is known about the interaction and
trade-offs between statistical accuracy and computational
complexity. Indeed a systematic joint treatment can answer some very
interesting questions: what is the best attainable statistical error
given a finite computational budget? What is the best learning method
to use given different computational constraints and desired
statistical yardsticks? Is it the case that simple methods outperform
complex ones in computationally impoverished scenarios?

At its core, the PAC framework aims to study learning through the lens
of computation. However, the thrust is on separating polynomial-time
from computationally intractable algorithms. However all
polynomial-time computations are hardly equivalent, and the difference
between linear vs quadratic dependence on problem parameters can have
a profound effect on the applicability of an algorithm. Understanding
the trade-offs between statistical accuracy and computational demands
in this situation is of paramount importance.

The need for such a theory is more compelling now than ever before
since we routinely face training corpuses with billions of examples,
and often, an even larger number of parameters to be estimated. The
emergence of web and mechanical turk as sources of training data often
stretches learning algorithms to the point that the bottleneck is no
longer the number of examples, but the amount of computation available
to process the examples. A theory to principally choose from a
multitude of learning methods based on the properties of training
examples as well as the computational resources available would be of
clear interest. Another way to pose the same problem would be to
design algorithms that can take as input a computational constraint
and try to learn the best hypothesis they can based on the available
budget and data.

There have been some works that try to address different facets of the above problem. Researchers working on massive datasets in the CS theory community look at streaming methods that aim to impose constraints on both the computational and storage requirements of the algorithms. Online learning presents one particular way of dealing with a computational budget, by processing as many samples as possible with the computational budget.

There have been some more relevant works in the machine learning community in the last few years. Bottou and Bousquet (2008)
compare the amount of computation needed to attain a certain
statistical error for a few routinely used optimization
algorithms. Shalev-Shwartz and Srebro (2009) show how stochastic gradient descent applied to SVM optimization can experience an inverse dependence on number of training sample in the regime of large
datasets. In some more recent works, Shalev-Shwartz and co-authors have also used cryptographic conjectures to establish the computational hardness of certain learning problems. On the algorithmic front, coarse-to-fine learning provides a nice framework to systematically incorporate computational considerations, using computational cost as a regularization term in the objective of the learning method. Other budgeted algorithms such as budgeted SVMs and budgeted perceptrons try to admit hard budget constraints on the running time and storage of the algorithm.

The goals of our workshop are:

* To draw the attention of machine learning researchers to this rich and emerging area of problems and to establish a community of researchers that are interested in understanding these tradeoffs.

* To define a number of common problems in this area and to encourage future research that is comparable and compatible.

* To expose the learning community to relevant work in fields such as CS theory and convex optimization.

We will call for papers on the following topics:

* Fundamental statistical limits with bounded computation

Trade-offs between statistical accuracy and computational costs

Algorithms to learn under budget constraints

* Budget constraints on other resources (like bounded memory)

* Computationally aware approaches such as coarse-to-fine learning

Author Information

Alekh Agarwal (Microsoft Research)
Sasha Rakhlin (University of Pennsylvania)

More from the Same Authors