Esteban Arcaute · Mohammad Ghavamzadeh · Shie Mannor · Georgios Theocharous

[ 512 e ]

The goal of this workshop is to study the challenges in learning, evaluating, and mining of e-commerce and more classical commerce domains. As the largest commerce and e-commerce companies on the planet are adopting machine learning technologies, it becomes increasingly clear that these domains present different challenges that classical machine learning problems.

In this workshop we plan to focus on the problems more than on solutions. We will consider problems such as identifying dysfunctional items or collections in a website, off-policy evaluation of marketing strategies, personalization of e-commerce experience, validation, sequential decisions, dynamic pricing, and others. Our main goal is to portray the main challenges of the field and to propose an industry-academia agreed collection of benchmarks problems for theoretical study and experimental work.

Dmitry Storcheus · Sanjiv Kumar · Afshin Rostamizadeh

[ 513 ef ]

UPDATE: The workshop proceedings will be published in a special issue of The Journal Of Machine Learning Research prior to the workshop date. For that reason, submissions are extended to 10 pages (excluding references and appendix) in JMLR format. The authors of accepted submissions will be asked to provide a camera-ready version within 7 days of acceptance notification.

The problem of extracting features from given data is of critical importance for the successful application of machine learning. Feature extraction, as usually understood, seeks for an optimal transformation from raw data into features that can be used as an input for a learning algorithm. In recent times this problem has been attacked using a growing number of diverse techniques that originated in separate research communities: from PCA and LDA to manifold and metric learning. It is the goal of this workshop to provide a platform to exchange ideas and compare results across these techniques.

The workshop will consist of three sessions, each dedicated to a specific open problem in the area of feature extraction. The sessions will start with invited talks and conclude with panel discussions, where the audience will engage into debates with speakers and organizers.

We welcome submissions from …

Andrew G Wilson · Alexander Smola · Eric Xing

[ 511 c ]

In 2015, every minute of the day, users share hundreds of thousands of pictures, videos, tweets, reviews, and blog posts. More than ever before, we have access to massive datasets in almost every area of science and engineering, including genomics, robotics, and climate science. This wealth of information provides an unprecedented opportunity to automatically learn rich representations of data, which allows us to greatly improve performance in predictive tasks, but also provides a mechanism for scientific discovery. That is, by automatically learning expressive representations of data, versus carefully hand crafting features, we can obtain a new theoretical understanding of our modelling problems. Recently, deep learning architectures have had success for such representation learning, particularly in computer vision and natural language processing.

Expressive non-parametric methods also have great potential for large-scale structure discovery; indeed, these methods can be highly flexible, and have an information capacity that grows with the amount of available data. However, there are practical challenges involved in developing non-parametric methods for large scale representation learning.

Consider, for example, kernel methods. A kernel controls the generalisation properties of these methods. A well chosen kernel leads to impressive empirical performances. Difficulties arise when the kernel is a priori unknown and …

Theofanis Karaletsos · Rajesh Ranganath · Suchi Saria · David Sontag

[ 510 bd ]

Recent years have seen an unprecedented rise in the availability and size of collections of clinical data such as electronic health records. These rich data sources present opportunities to apply and develop machine learning methods to solve problems faced by clinicians and to usher in new forms of medical practice that would otherwise be infeasible. The aim of this workshop is to foster discussions between machine learning researchers and clinicians of how machine learning can be used to address fundamental problems in health care.

Of particular interest to this year’s workshop is statistical modeling. The role of modeling in healthcare is two-fold. First, it provides clinicians with a tool to aid exploration of hypotheses in a data-driven way. Second, it furnishes evidence-based clinically actionable predictions. Examples include machine learning of disease progression models, where patients and diseases are characterized by states that evolve over time, or dose-response models, where the treatment details involving complex and often combinatorial therapies can be inferred in a data driven way to optimally treat individual patients. Such methods face many statistical challenges such as accounting for confounding effects like socioeconomic backgrounds or genetic alterations in subpopulations. Causal models learned from large collections of patient records, …

Pieter Abbeel · John Schulman · Satinder Singh · David Silver

[ 513 cd ]

Although the theory of reinforcement learning addresses an extremely general class of learning problems with a common mathematical formulation, its power has been limited by the need to develop task-specific feature representations. A paradigm shift is occurring as researchers figure out how to use deep neural networks as function approximators in reinforcement learning algorithms; this line of work has yielded remarkable empirical results in recent years. This workshop will bring together researchers working at the intersection of deep learning and reinforcement learning, and it will help researchers with expertise in one of these fields to learn about the other.

Adam Smith · Aaron Roth · Vitaly Feldman · Moritz Hardt

[ 514 a ]

Adaptive data analysis is the increasingly common practice by which insights gathered from data are used to inform further analysis of the same data sets. This is common practice both in machine learning, and in scientific research, in which data-sets are shared and re-used across multiple studies. Unfortunately, most of the statistical inference theory used in empirical sciences to control false discovery rates, and in machine learning to avoid overfitting, assumes a fixed class of hypotheses to test, or family of functions to optimize over, selected independently of the data. If the set of analyses run is itself a function of the data, much of this theory becomes invalid, and indeed, has been blamed as one of the causes of the crisis of reproducibility in empirical science.

Recently, there have been several exciting proposals for how to avoid overfitting and guarantee statistical validity even in general adaptive data analysis settings. The problem is important, and ripe for further advances. The goal of this workshop is to bring together members of different communities (from machine learning, statistics, and theoretical computer science) interested in solving this problem, to share recent results, to discuss promising directions for future research, and to foster collaborations. …

Asli Celikyilmaz · Milica Gasic · Dilek Hakkani-Tur

[ 511 b ]

The emergence of virtual personal assistants such as SIRI, Cortana, Echo, and Google Now, is generating increasing interest in research in speech understanding and spoken interaction. However, whilst the ability of these agents to recognise conversational speech is maturing rapidly, their ability to understand and interact is still limited to a few specific domains, such as weather information, local businesses, and some simple chit-chat. Their conversational capabilities are not necessarily apparent to users. Interaction typically depends on handcrafted scripts and is often guided by simple commands. Deployed dialogue models do not fully make use of the large amount of data that these agents generate. Promising approaches that involve statistical models, big data analysis, representation of knowledge (hierarchical, relations, etc. ), utilising and enriching semantic graphs with natural language components, multi-modality, etc. are being explored in multiple communities, such as natural language processing (NLP), speech processing, machine learning (ML), and information retrieval. However, we are still only scratching the surface in this field. The aim of this workshop, therefore, is to bring together researchers interested in understanding and interaction in conversational agents, to discuss the challenges and new and emerging topics in machine learning which might lead to richer and more …

Oren Anava · Azadeh Khaleghi · Vitaly Kuznetsov · Alexander Rakhlin

[ 514 bc ]

Data, in the form of time-dependent sequential observations emerge in many key real-world problems ranging from biological data, to financial markets, to weather forecasting and audio/video processing. However, despite the ubiquity of such data, the vast majority of learning algorithms have been primarily developed for the setting in which sample points are drawn i.i.d. from some possibly unknown fixed distribution. While there exist algorithms designed to handle non-i.i.d. data, these typically assume specific parametric form of data-generating distribution. Such assumptions may undermine the possibly complex nature of modern data which can possess long-range dependency patterns that we now have the computing power to discern. On the other extreme, some online learning algorithms consider a non-stochastic framework without any distributional assumptions. However, such methods may fail to fully address the stochastic aspect of real-world time-series data.

The goal of this workshop is to bring together theoretical and applied researchers interested in the analysis of time series, and the development of new algorithms to process sequential data. This includes algorithms for time series prediction, classification, clustering, anomaly and change point detection, correlation discovery, dimensionality reduction as well as a general theory for learning and comparing stochastic processes. We invite researchers from the …

Louis-Philippe Morency · Tadas Baltrusaitis · Aaron Courville · Kyunghyun Cho

[ 512 dh ]

Workshop Overview
Multimodal machine learning aims at building models that can process and relate information from multiple modalities. From the early research on audio-visual speech recognition to the recent explosion of interest in models mapping images to natural language, multimodal machine learning is is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential.
Learning from paired multimodal sources offers the possibility of capturing correspondences between modalities and gain in-depth understanding of natural phenomena. Thus, multimodal data provides a means of reducing our dependence on the more standard supervised learning paradigm that is inherently limited by the availability of labeled examples.

This research field brings some unique challenges for machine learning researchers given the heterogeneity of the data and the complementarity often found between modalities. This workshop will facilitate the progress in multimodal machine learning by bringing together researchers from natural language processing, multimedia, computer vision, speech processing and machine learning to discuss the current challenges and identify the research infrastructure needed to enable a stronger multidisciplinary collaboration.

For keynote talk abstracts and MMML 2015 workshop proceedings:

Oral presentation
- Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences - Hongyuan Mei, Mohit Bansal, Matthew …

Ted Meeds · Michael Gutmann · Dennis Prangle · Jean-Michel Marin · Richard Everitt

[ 511 a ]

Approximate Bayesian computation (ABC) or likelihood-free (LF) methods have developed mostly beyond the radar of the machine learning community, but are important tools for a large and diverse segment of the scientific community. This is particularly true for systems and population biology, computational neuroscience, computer vision, healthcare sciences, but also many others.

Interaction between the ABC and machine learning community has recently started and contributed to important advances. In general, however, there is still significant room for more intense interaction and collaboration. Our workshop aims at being a place for this to happen.

The workshop will consist of invited and contributed talks, poster spotlights, and a poster session. Rather than a panel discussion we will encourage open discussion between the speakers and the audience.

Artur Garcez · Tarek R. Besold · Risto Miikkulainen · Gary Marcus

[ 512 cg ]

While early work on knowledge representation and inference was primarily symbolic, the corresponding approaches subsequently fell out of favor, and were largely supplanted by connectionist methods. In this workshop, we will work to close the gap between the two paradigms, and aim to formulate a new unified approach that is inspired by our current understanding of human cognitive processing. This is important to help improve our understanding of Neural Information Processing and build better Machine Learning systems, including the reuse of knowledge learned in one application domain in analogous domains.

The workshop brings together world leaders in the fields of neural computation, logic and artificial intelligence, natural language understanding, cognitive science, and computational neuroscience. Over the two workshop days, their invited lectures will be complemented with presentations based on contributed papers and poster sessions, giving ample opportunity to interact and discuss the different perspectives and emerging approaches.

The workshop targets a single broad theme of general interest to the vast majority of the NIPS community, namely the study of translations and ways of integration between neural models and knowledge representation for the purpose of achieving an effective integration of learning and reasoning. Neural-symbolic computing is now an established topic of …

Irina Rish · Leila Wehbe · Brian Murphy · Georg Langs · Guillermo Cecchi · Moritz Grosse-Wentrup

[ Room 515 a ]


Modern multivariate statistical methods have been increasingly applied to various problems in neuroimaging, including “mind reading”, “brain mapping”, clinical diagnosis and prognosis. Multivariate pattern analysis (MVPA) methods are designed to examine complex relationships between large-dimensional signals, such as brain MRI images, and an outcome of interest, such as the category of a stimulus, with a limited amount of data. The MVPA approach is in contrast with the classical mass-univariate (MUV) approach that treats each individual imaging measurement in isolation.

Recent work in neuroscience has started to move away from conventional lab-based studies, towards more naturalistic behavioral tasks (e.g. normal reading, movie watching), with mobile neuroimaging technologies (EEG, NIRS), and real-world applications (e.g. in psychiatry, or education) that make use of other available data sources.

This trend presents challenges and opportunities for machine learning. Real world applications typically involve much larger quantities of data, which can be continuously recorded in natural environments like the classroom, home or workplace. But this data is more noisy due to the lower-spec hardware and less controlled environment. And gathering data from much broader swathes of the population, whether healthy or dealing with a condition, results in more uncontrolled variation.

ML techniques have already …

Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei

[ 513 ab ]

The ever-increasing size of data sets has resulted in an immense effort in Bayesian statistics to develop more expressive and scalable probabilistic models. Inference remains a challenge and limits the use of these models in large-scale scientific and industrial applications. Asymptotically exact schemes such as Markov chain Monte Carlo (MCMC) are often slow to run and difficult to evaluate in finite time. Thus we must resort to approximate inference, which allows for more efficient run times and more reliable convergence diagnostics on large-scale and streaming data—without compromising on the complexity of these models. This workshop aims to bring together researchers and practitioners in order to discuss recent advances in approximate inference; we also aim to discuss the methodological and foundational issues in such techniques in order to consider future improvements.

The resurgence of interest in approximate inference has furthered development in many techniques: for example, scalability, variance reduction, and preserving dependency in variational inference; divide and conquer techniques in expectation propagation; dimensionality reduction using random projections; and stochastic variants of Laplace approximation-based methods. Approximate inference techniques have clearly emerged as the preferred way to perform tractable Bayesian inference. Despite this interest, there remain significant trade-offs in speed, accuracy, generalizability, and …

Pavel Serdyukov · Andrey Ustyuzhanin · Marcin Chrząszcz · Francesco Dettori · Marc-Olivier Bettler

[ 515 bc ]

Experimental physics actively develops frontiers of our knowledge of the Universe and ranges from macroscopic objects observed through telescopes to micro-world of particle interaction. In each field of study scientists go from raw measurements (celestial objects spectra or energies of detected particles inside collider detectors) to higher levels of the representation that are more suitable for further analysis and to human perception. Each measurement can be used for supporting or refuting certain theory that compete for predictive power and completeness.

In many areas of physical experiments it assimilated computational paradigms a long time ago: both simulators and semi-automatic data analysis techniques have been applied widely for decades. In particular, nonparametric classification and regression are now routinely used as parts of the reconstruction (inference) chain. More recently, state-of-the-art budgeted learning techniques have also started to be used for real-time event selection on LHC. Nevertheless, most of these applications went largely unnoticed by the machine learning (ML) community.

Our primary goal is to bring the Physics and ML communities together to initiate discussions on Physics-motivated problems and applications in ML. It is not unknown that the ML community is still largely untouched by the numerous learning challenges coming from Physics. We hope …

Suvrit Sra · Alekh Agarwal · Leon Bottou · Sashank J. Reddi

[ 510 ac ]

Dear NIPS Workshop Chairs,

We propose to organize the workshop

OPT2015: Optimization for Machine Learning.

As the eighth in its series, OPT 2015 builds on significant precedent established by OPT 2008--OPT 2014, all of which have been remarkably well-received NIPS workshops.

The previous OPT workshops enjoyed packed (to overpacked) attendance, and this enthusiastic reception is an attestation to the great importance of optimization within machine learning.
The intersection of OPT and ML has grown monotonically over the years, to the extent that now many cutting edge advances in optimization are arising from the ML community. The driving feature is the departure of algorithms from textbook approaches, in particular by paying attention to problem specific structure and to deployability in practical (even industrial) big-data settings.

This intimate relation of optimization with ML is the key motivation for our workshop. We wish to use OPT2015 as a platform to foster discussion, discovery, and dissemination of the state-of-the-art in optimization as relevant to machine learning.

As in the past years, the workshop will continue to bring luminaries from the field of optimization to share classical perspectives, as well as give a platform for thought leaders from machine learning to share exciting recent advances. …

Michael A Osborne · Philipp Hennig

[ 512 a ]

Integration is the central numerical operation required for Bayesian machine learning (in the form of marginalization and conditioning). Sampling algorithms still abound in this area, although it has long been known that Monte Carlo methods are fundamentally sub-optimal. The challenges for the development of better performing integration methods are mostly algorithmic. Moreover, recent algorithms have begun to outperform MCMC and its siblings, in wall-clock time, on realistic problems from machine learning.

The workshop will review the existing, by now quite strong, theoretical case against the use of random numbers for integration, discuss recent algorithmic developments, relationships
between conceptual approaches, and highlight central research challenges going forward.

Among the questions to be addressed by the workshop are
* How fast can a practical integral estimate on a deterministic function converge (polynomially, super-polynomially, not just “better than sqrt(N)”)?
* How are these rates related, precisely, to prior assumptions about the integrand, and to the design rules of the integrator?
* To which degree can the source code of an integration problem be parsed to choose informative priors?
* Are random numbers necessary and helpful for efficient multivariate integration, or are they a conceptual crutch that cause inefficiencies?
* What are the practical …

Samuel J Gershman · Falk Lieder · Tom Griffiths · Noah Goodman

[ 512 bf ]

Formal definitions of rationality are instrumental for understanding and designing intelligent systems. By specifying the optimal way to reason under the constraint of limited information, Bayesian rationality has enabled tremendous advances in machine learning and artificial intelligence together with deep insights into human cognition and brain function. Bounded optimality (Horvitz, 1989; Russell, & Wefald, 1991a) extends Bayesian rationality by taking into account two additional constraints: limited time and finite computational resources. Bounded optimality is a practical framework for designing the best AI system possible given the constraints of its limited-performance hardware (Russell & Subramanian, 1995), and provides a way to capture the time and resource-constraints on human cognition. To adaptively allocate their finite computational bounded agents may have to perform rational metareasoning (Russel, & Wefald, 1991b) which corresponds to topics like cognitive control and metacognition studied in cognitive neuroscience and psychology.

Current research in cognitive science is leveraging bounded optimality and rational metareasoning to understand how the human mind can achieve so much with so little computation (Gershman, Horvitz, & Tenenbaum, in press; Vul, Griffiths, Goodman, & Tenenbaum, 2014), to develop and constrain process models of cognition (Griffiths, Lieder, & Goodman, 2015; Lewis, Howes, & Singh, 2014), to reevaluate the …

Alyson Fletcher · Jakob H Macke · Ryan Adams · Jascha Sohl-Dickstein

[ 511 f ]

8:15 Opening remarks and welcome
8:30 Surya Ganguli   Towards a theory of high dimensional, single trial neural data analysis:
On the role of random projections and phase transitions
9:00 Katherine Heller   Translating between human & animal studies via
Bayesian multi-task learning
9:30 Mitya Chklovskii   Similarity matching: A new theory of neural computation
10:00 Coffee break 1
10:30 Poster Session 1
11:00 Matthias Bethge   Let's compete—benchmarking models in neuroscience
11:30 Yoshua Bengio   Small Steps Towards Biologically Plausible Deep Learning

12:00 Lunch
2:30 Pulkit Agrawal The Human Visual Hierarchy is Isomorphic to the Hierarchy learned
by a Deep Convolutional Neural Network Trained for Object Recognition
3:00 Yann Lecun   Unsupervised Learning
3:30 Poster Session 2
4:00 Coffee break 2
4:30 Neil Lawrence   The Mechanistic Fallacy and Modelling how we Think
5:00 Panel: Deep Learning and neuroscience:
What can brains tell us about massive computing and vice versa?
Yoshua Bengio, Matthias Bethge, Surya Ganguli, Konrad Kording, Yann Lecun, Neil Lawrence
6:00 Wrap up

Pulkit Agrawal, Mark D. Lescroart, Dustin E. Stansbury, Jitendra Malik, & Jack L. Gallant  : The Human Visual Hierarchy is Isomorphic to the Hierarchy learned by a Deep Convolutional Neural Network Trained for Object Recognition
Christian Donner and Hideaki Shimazaki: …

Tim van Erven · Wouter Koolen

[ 511 d ]

In both stochastic and online learning we have a good theoretical
understanding of the most difficult learning tasks through worst-case
or minimax analysis, and we have algorithms to match. Yet there are
commonly occurring cases that are much easier than the worst case
where these methods are overly conservative, showing a large gap
between the performance predicted by theory and observed in
practice. Recent work has refined our theoretical understanding of the
wide spectrum of easy cases, leading to the development of algorithms
that are robust to the worst case, but can also automatically adapt to
easier data and achieve faster rates whenever possible.

Examples of easier cases include (Tsybakov) margin conditions, low
noise or variance, probabilistic Lipschitzness and empirical curvature
of the loss (strong convexity, exp-concavity, mixability), as well as
low-complexity decision boundaries and comparators, quantile bounds,
and cases with few switches among few leaders. Adapting to such easy
data often involves data-dependent bias-variance trade-offs through
hyper-parameter learning, adaptive regularisation or exploration, or
hypothesis testing to distinguish between easy and hard cases.

The last two years have seen many exciting new developments in the
form of new desirable adaptivity targets, new algorithms and new
analysis techniques. In this workshop …

Manfred Opper · Yasser Roudi · Peter Sollich

[ 511 e ]

Invited speakers
Jose Bento Ayres Pereira, Boston College
Alfredo Braunstein, Politecnico di Torino
Ramon Grima, University of Edinburgh
Jakob Macke, MPI Biological Cybernetics Tuebingen
Andrea Montanari, Stanford University
Graham Taylor, University of Guelph

This workshop is co-sponsored by the European Network "NETADIS" (Statistical Physics Approaches to Networks Across Disciplines). See for further information and workshop details (NIPS 2015 tab).

Workshop overview
Inference and learning on large graphical models, i.e. large systems of simple probabilistic units linked by a complex network of interactions, is a classical topic in machine learning. Such systems are also an active research topic in the field of statistical physics.

The main interaction between statistical physics and machine learning has so far been in the area of analysing data sets without explicit temporal structure. Here methods of equilibrium statistical physics, developed for studying Boltzmann distributions on networks of nodes with e.g. pairwise interactions, are closely related to graphical model inference techniques; accordingly there has been much cross-fertilization leading to both conceptual insights and more efficient algorithms. Models can be learned from recorded experimental or other empirical data, but even when samples come from e.g. a time series this aspect of the data is typically ignored.

More …

Manik Varma · Moustapha M Cisse

[ 511 f ]

Extreme classification, where one needs to deal with multi-class and multi-label problems involving an extremely large number of labels, has opened up a new research frontier in machine learning. Many challenging applications, such as photo, video and tweet annotation and web page categorization, can benefit from being formulated as supervised learning tasks with millions of labels. Extreme classification can also lead to a fresh perspective on other learning problems such as ranking and recommendation by reformulating them as multi-class/label tasks where each item to be ranked or recommended is a separate label.

Extreme classification raises a number of interesting research questions including those related to:

* Large scale learning and distributed and parallel training
* Log-time and log-space prediction and prediction on a test-time budget
* Label embedding and tree approaches
* Crowd sourcing, preference elicitation and other data gathering techniques
* Bandits, semi-supervised learning and other approaches for dealing with training set biases and label noise
* Bandits with an extremely large number of arms
* Fine-grained classification
* Zero shot learning and extensible output spaces
* Tackling label polysemy, synonymy and correlations
* Structured output prediction and multi-task learning
* Learning from highly imbalanced data
* Dealing with …

Jason E Weston · Sumit Chopra · Antoine Bordes

[ 510 ac ]

Motivation and Objective of the Workshop

In order to solve AI, a key component is the use of long term dependencies as well as short term context during inference, i.e., the interplay of reasoning, attention and memory. The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) long-term memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform attention over its memory. Within the last year or so, in part inspired by some earlier works [8, 9, 14, 15, 16, 18, 19], there has been some notable progress in these areas which this workshop addresses. Models developing notions of attention [12, 5, 6, 7, 20, 21] have shown positive results on a number of real-world tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing …

Nathan Wiebe · Seth Lloyd

[ 512 a ]

Recent strides in quantum computing have raised the prospects that near term quantum devices can expediently solve computationally intractable problems in simulation, optimization and machine learning. The opportunities that quantum computing raises for machine learning is hard to understate. It opens the possibility of dramatic speedups for machine learning tasks, richer models for data sets and more natural settings for learning and inference than classical computing affords.

The goal of this workshop is, through a series of invited and contributed talks, survey the major results in this new area and facilitate increased dialog between researchers within this field and the greater machine learning community. Our hope is that such discussion will not only help researchers to fully leverage the promise of quantum machine learning but also address deep fundamental issues such as the question of what learning means in a quantum environment or whether quantum phenomena like entanglement may play a role in modeling complex data sets.

Anima Anandkumar · Niranjan Uma Naresh · Kamalika Chaudhuri · Percy Liang · Sewoong Oh

[ 513 cd ]

Non-convex optimization is ubiquitous in machine learning. In general, reaching the global optima of these problems is NP-hard and in practice, local search methods such as gradient descent can get stuck in spurious local optima and suffer from poor convergence.

Over the last few years, tremendous progress has been made in establishing theoretical guarantees for many of the non-convex optimization problems. While there are worst-case instances which are computationally hard to solve, focus has shifted in characterizing transparent conditions for cases which are tractable. In many instances, these conditions turn out to be mild and natural for machine learning applications.

One area of non-convex optimization which has attracted extensive interest is spectral learning. This involves finding spectral decomposition of matrices and tensors which correspond to moments of a multivariate distribution. These algorithms are guaranteed to recover a consistent solution to parameter estimation problem in many latent variable models such as topic admixture models, HMMs, ICA, and most recently, even non-linear models such as neural networks. In contrast to traditional algorithms like expectation maximization (EM), these algorithms come with polynomial computational and sample complexity guarantees. Analysis of these methods involves understanding the optimization landscape for tensor algebraic structures.

As another example …

Edo M Airoldi · David S Choi · Aaron Clauset · Johan Ugander · Panagiotis Toulis

[ 512 bf ]

Problems involving networks and massive network datasets motivate some of the most difficult and exciting inferential challenges in the social and information sciences. Modern network datasets in these areas represent complex relationships with rich information on vertex attributes, edge weights, multiple types of vertices and characteristics, all of which may be changing over time. These datasets are often enormous in size, detail, and heterogeneity, pushing the limits of existing inferential frameworks, while also requiring detailed domain knowledge in order to support useful inferences or predictions. Much progress has been made on developing rigorous tools for analyzing and modeling some types of large real-world social and information network datasets, but often this progress is distributed across disparate applied and theoretical domains. Network analysis is still a young and highly cross-disciplinary field, and the goal of this workshop is to promote cross-pollination between its constituent research communities.

In particular, this workshop aims to bring together a diverse and cross-disciplinary set of researchers to discuss recent advances and future directions for developing new network methods in statistics and machine learning. By network methods, we broadly include those models and algorithms whose goal is to learn the patterns of interaction, flow of information, or …

Babak Shahbaba · Yee Whye Teh · Max Welling · Arnaud Doucet · Christophe Andrieu · Sebastian J. Vollmer · Pierre Jacob

[ 513 ab ]

In recent years, there have been ever-increasing demands for data-intensive scientific research. Routine use of digital sensors, high throughput experiments, and intensive computer simulations have created a data deluge imposing new challenges on scientific communities that attempt to process and analyze such data. This is especially challenging for scientific studies that involve Bayesian methods, which typically require computationally intensive Monte Carlo algorithms for their implementation. As a result, although Bayesian methods provide a robust and principled framework for analyzing data, their relatively high computational cost for Big Data problems has limited their application. The objective of this workshop is to discuss the advantages of Bayesian inference in the age of Big Data and to introduce new scalable Monte Carlo methods that address computational challenges in Bayesian analysis. This is a follow up to our recent workshop on Bayesian Inference for Big Data (BIBiD 2015) at Oxford University ( It will consist of invited and contributed talks, poster spotlights, and a poster session. There will be a panel discussion on "Bayesian inference for Big Data" at the end of the session. Topics of interest include (but are not limited to):
• Advantages of Bayesian methods in the age of Big Data …

Inderjit Dhillon · Risi Kondor · Rob Nowak · Michael O'Neil · Nedelina Teneva

[ 511 c ]

There is a surge of new work at the intersection of multiresolution/multiscale methods and machine learning:

- Multiresolution (wavelets) on graphs is one of the hottest topics in harmonic analysis, with important implications for learning on graphs and semi-spervised learning.
- Hierarchical matrices (HODLR, H, H2 and HSS matrices), a very active area in numerical analysis, have also been shown to be effective in Gaussian processes inference.
- Scattering networks are a major breakthrough, and combine ideas from wavelet analysis and deep learning.
- Multiscale graph models are ever more popular because they can capture important structures in real world networks.
- Multiscale matrix decompositions and multiresolution matrix factorizations, mirroring some features of algebraic multigrid methods, are gaining traction in large scale data applications.

The goal of this workshop is to bring together leading researchers from Harmonic Analysis, Signal Processing, Numerical Analysis, and Machine Learning, to explore the synergies between all the above lines of work.

Nicolo Fusi · Anna Goldenberg · Sara Mostafavi · Gerald Quon · Oliver Stegle

[ 510 bd ]

The field of computational biology has seen dramatic growth over the past few years. A wide range of high-throughput technologies developed in the last decade now enable us to measure parts of a biological system at various resolutions—at the genome, epigenome, transcriptome, and proteome levels. These technologies are now being used to collect data for an ever-increasingly diverse set of problems, ranging from classical problems such as predicting differentially regulated genes between time points and predicting subcellular localization of RNA and proteins, to models that explore complex mechanistic hypotheses bridging the gap between genetics and disease, population genetics and transcriptional regulation. Fully realizing the scientific and clinical potential of these data requires developing novel supervised and unsupervised learning methods that are scalable, can accommodate heterogeneity, are robust to systematic noise and confounding factors, and provide mechanistic insights. <br><br>       The goals of this workshop are to i) present emerging problems and innovative machine learning techniques in computational biology, and ii) generate discussion on how to best model the intricacies of biological data and synthesize and interpret results in light of the current work in the field. We will invite several rising leaders from the biology/bioinformatics community who will present current research …
Alex Beutel · Tianqi Chen · Sameer Singh · Elaine Angelino · Markus Weimer · Joseph Gonzalez

[ 511 d ]

The broadening use of machine learning, the explosive growth in data, and the complexity of the large-scale learning systems required to analyze these data have together fueled interdisciplinary research at the intersection of Machine Learning and System design. Addressing these challenges demands a combination of the right abstractions -- for algorithms, data structures, and interfaces -- as well as scalable systems capable of addressing real world learning problems. At the same time, it is becoming increasingly clear that data-driven and learning-driven approaches provide natural and powerful solutions to building and managing complex modern systems. In total, the flow of ideas between these two communities continues to offer promising opportunities toward solving even larger problems.

Designing systems for machine learning presents new challenges and opportunities over the design of traditional data processing systems. For example, what is the right abstraction for data consistency in the context of parallel, stochastic learning algorithms? What guarantees of fault tolerance are needed during distributed learning? The statistical nature of machine learning offers an opportunity for more efficient systems but requires revisiting many of the challenges addressed by the systems and database communities over the past few decades. Machine learning focused developments in distributed learning platforms, …

Anastasia Pentina · Christoph Lampert · Sinno Jialin Pan · Mingsheng Long · Judy Hoffman · Baochen Sun · Kate Saenko

[ 514 bc ]

This workshop aims to bring together researchers and practitioners from machine learning, computer vision, natural language processing and related fields to discuss and document recent advances in transfer and multi-task learning. This includes the main topics of transfer and multi-task learning, together with several related variants as domain adaptation and dataset bias, and new discoveries and directions in deep learning based approaches.

Transfer and multi-task learning methods aim to better exploit the available data during training and adapt previously learned knowledge to new domains or tasks. This mitigates the burden of human labeling for emerging applications and enables learning from very few labeled examples.

In the past years there have been increasing activities in these areas, mainly driven by practical applications (e.g. object recognition, sentiment analysis) as well as state-of-the-art deep learning frameworks (e.g. CNN). Of the recently proposed solutions, most lack joint theoretical justifications, especially those deep learning based approaches. On the other hand, most of the existing theoretically justified approaches are rarely used in practice.

This NIPS 2015 workshop will focus on closing the gap between theory and practice by providing an opportunity for researchers and practitioners to get together, to share ideas and debate current theories and …

Vicenç Gómez · Gerhard Neumann · Jonathan S Yedidia · Peter Stone

[ 511 a ]

In the next few years, traditional single agent architectures will be more and more replaced by actual multi-agent systems with components that have increasing autonomy and computational power. This transformation has already started with prominent examples such as power networks, where each node is now an active energy generator, robotic swarms of unmaned aerial vehicles, software agents that trade and negotiate on the Internet or robot assistants that need to interact with other robots or humans. The number of agents in these systems can range from a few complex agents up to several hundred if not thousands of typically much simpler entities.
Multi-agent systems show many beneficial properties such as robustness, scalability, paralellization and a larger number of tasks that can be achieved in comparison to centralized, single agent architectures. However, the use of multi-agent architectures represents a major paradigm shift for systems design. In order to use such systems efficiently, effective approaches for planning, learning, inference and communication are required. The agents need to plan with their local view on the world and to coordinate at multiple levels. They also need to reason about the knowledge, observations and intentions of other agents, which can in turn be cooperative or …

Giorgio Patrini · Tony Jebara · Richard Nock · Dimitrios Kotzias · Felix Xinnan Yu

[ 512 dh ]

Can we learn to locate objects in images, only from the list of objects those images contain? Or the sentiment of a phrase in a review from the overall score? Can we tell who voted for Obama in 2012? Or which population strata are more likely to be infected by Ebola, only looking at geographical incidence and census data? Are large corporations able to infer sensitive traits of their customers such as sex preferences, unemployment or ethnicity, only based on state-level statistics?

In contrast, how can we publicly release data containing personal information to the research community, while guaranteeing that individuals’ sensitive information will not be compromised? How realistic is the idea of outsourcing machine-learning tasks without sharing datasets but only a few statistics sufficient for training?

Despite their diversity, solutions to those problems can be surprisingly alike, as they all play with the same elements: variables without a clear one-to-one mapping, and the search for/the protection against models and statistics sufficient to recover the relevant variables.

Aggregate statistics and obfuscated data are abundant, as they are released much more frequently than plain individual-level information; the latter are often too sensitive because of privacy constraints or business value, or too …

Isabelle Guyon · Evelyne Viegas · Ben Hamner · Balázs Kégl

[ 512 e ]

Challenges in Machine Learning have proven to be efficient and cost-effective ways to quickly bring to industry solutions that may have been confined to research. In addition, the playful nature of challenges naturally attracts students, making challenge a great teaching resource. Challenge participants range from undergraduate students to retirees, joining forces in a rewarding environment allowing them to learn, perform research, and demonstrate excellence. Therefore challenges can be used as a means of directing research, advancing the state-of-the-art or venturing in completely new domains.

Because challenges have become stream line in the execution of Machine Learning projects, it has become increasingly important to regularly bring together workshop organizers, platform providers, and participants to discuss best practices in challenge organization and new methods and application opportunities to design high impact challenges. Following the success of last year's workshop (, in which a fruitful exchange led to many innovations, we propose to reconvene and discuss the new avenues that have been explored and lay the basis for further developments. We are particularly interested in following progresses made in two conceptually important directions:
1) Open innovation: Organization of contests in which data are made available and the contestants must both formalize and solve …

Joseph Jay Williams · Yasin Abbasi Yadkori · Finale Doshi-Velez

[ 514 a ]

UP TO DATE SCHEDULE is at Website: or
(MLAIHCI – Machine Learning, Artificial Intelligence, Human-Computer Interaction)


8:50. Introductions

Michael Littman, Brown University: "Reinforcement Learning from users: New algorithms and frameworks"

10-10:30 Coffee Break

Machine Teaching
Jerry Zhu, University of Wisconsin Madison: "Machine Teaching as a Framework for Personalized Education"

Hoang M. Le, Yisong Yue, & Peter Carr. "Smooth Imitation Learning." [PDF]

11:45-1:30 Lunch.

Embedding Algorithms in User Technologies
John Langford, Microsoft Research: "An Interactive Learning Platform for Making Decisions"
Neil Heffernan, Worcester Polytechnic Institute: "Enabling real-time evaluation of crowdsourced machine learning algorithms: Experimentation and Personalization in online math problems on"

3:00-4:00 Spotlights & Posters

4-4:30 coffee break
Ambuj Tewari, Huitian Lei, & Susan Murphy. University of Michigan. "From Ads to Interventions: Contextual Bandit Algorithms for Mobile Health". (NIH application to "Heartsteps")

5:30-6:30 Conclusions & Future Directions


Jerry Zhu, University of Wisconsin Madison: "Machine Teaching as a Framework for Personalized Education"

Michael Littman, Brown University: "Reinforcement Learning from users: New algorithms and frameworks"

John Langford, Microsoft Research: "An Interactive Learning Platform for Making Decisions"

Neil Heffernan, Worcester Polytechnic Institute: "Enabling real-time evaluation of crowdsourced machine learning algorithms: …

Bobak Shahriari · Ryan Adams · Nando de Freitas · Amar Shah · Roberto Calandra

[ 511 b ]

Bayesian optimization has emerged as an exciting subfield of machine learning that is concerned with the global optimization of noisy, black-box functions using probabilistic methods. Systems implementing Bayesian optimization techniques have been successfully used to solve difficult problems in a diverse set of applications. There have been many recent advances in the methodologies and theory underpinning Bayesian optimization that have extended the framework to new applications as well as provided greater insights into the behaviour of these algorithms. Bayesian optimization is now increasingly being used in industrial settings, providing new and interesting challenges that require new algorithms and theoretical insights.

At last year’s NIPS workshop on Bayesian optimization the focus was on the intersection of “academia and industry”. Following up on this theme, the workshop this year will focus on scaling existing approaches to larger evaluation budgets, higher-dimensional search spaces, and more complex input spaces. While the computational complexity of common probabilistic regression models used in Bayesian optimization have confined it to relatively low-dimensional problems and small evaluation budgets, there have, in recent years, been several advances in scaling these probabilistic models to more demanding application domains. Furthermore, many applications of Bayesian optimization only make sense when considering concurrent evaluations, …

Josh Tenenbaum · Jan-Willem van de Meent · Tejas Kulkarni · S. M. Ali Eslami · Brooks Paige · Frank Wood · Zoubin Ghahramani

[ 513 ef ]

Probabilistic models have traditionally co-evolved with tailored algorithms for efficient learning and inference. One of the exciting developments of recent years has been the resurgence of black box methods, which make relatively few assumptions about the model structure, allowing application to broader model families.

In probabilistic programming systems, black box methods have greatly improved the capabilities of inference backends. Similarly, the design of connectionist models has been simplified by the development of black box frameworks for training arbitrary architectures. These innovations open up opportunities to design new classes of models that smoothly negotiate the transition from low-level features of the data to high-level structured representations that are interpretable and generalize well across examples.

This workshop brings together developers of black box inference technologies, probabilistic programming systems, and connectionist computing frameworks. The goal is to formulate a shared understanding of how black box methods can enable advances in the design of intelligent learning systems. Topics of discussion will include:

* Black box techniques for gradient ascent, variational inference, Markov chain- and sequential Monte Carlo.
* Implementation of black box techniques in probabilistic programming systems and computing frameworks for connectionist model families.
* Models that integrate top-down and bottom-up model representations to …

Eva Dyer · Joshua T Vogelstein · Konrad Koerding · Jeremy Freeman · Andreas S. Tolias

[ 511 e ]

Advances in optics, chemistry, and physics have revolutionized the development of experimental methods for measuring neural activity and structure. Some of the next generation methods for neural recording, promise extremely large and detailed measurements of the brain’s architecture and function. The goal of this workshop is to provide an open forum for the discussion of a number of important questions related to how machine learning can aid in the analysis of these next generation neural datasets. What are some of the new machine learning and analysis problems that will arise as new experimental methods come online? What are the right distributed and/or parallel processing computational models to use for these different datasets? What are the computational bottlenecks/challenges in analyzing these next generation datasets?

In the morning, the goal will be to discuss new experimental techniques and the computational issues associated with analyzing the datasets generated by these techniques. The morning portion of the workshop will be organized into three hour-long sessions. Each session will start with a 30 minute overview of an experimental method, presented by a leading experimentalist in this area. Afterwards, we will have a 20 minute follow up from a computational scientist that will highlight the computational …

Tamara Broderick · Nick Foti · Aaron Schein · Alex Tank · Hanna Wallach · Sinead Williamson

[ 515 bc ]

In theory, Bayesian nonparametric (BNP) methods are perfectly suited to the modern-day, large data sets that arise in the physical, natural, and social sciences, as well as in technology and the humanities. By making use of infinite-dimensional mathematical structures, Bayesian nonparametric statistics allows the complexity of a learned model to grow as the size of a data set grows---exhibiting desirable Bayesian regularization properties for small data sets and allowing the practitioner to learn ever more from data sets as they become larger.

This flexibility, however, presents both computational and modeling challenges. While there have been recent developments in accelerated inference for Bayesian nonparametric models, many approaches are not appropriate for large datasets. Further, while we have seen a growth in models for applied problems that move beyond the foundational Dirichlet and Gaussian processes, the widespread adoption of BNP methods has been limited in applied fields. In this workshop, we will address the modeling, theoretical, and computational challenges limiting adoption and how they can be circumvented. In particular, we will engage with applications specialists to better understand the best directions for BNP development as a tool for conducting applied research. We will explore computational tools for posterior inference algorithms that address …