Timezone: »
Invited speakers
Jose Bento Ayres Pereira, Boston College
Alfredo Braunstein, Politecnico di Torino
Ramon Grima, University of Edinburgh
Jakob Macke, MPI Biological Cybernetics Tuebingen
Andrea Montanari, Stanford University
Graham Taylor, University of Guelph
This workshop is cosponsored by the European Network "NETADIS" (Statistical Physics Approaches to Networks Across Disciplines). See http://www.netadis.eu for further information and workshop details (NIPS 2015 tab).
Workshop overview
Inference and learning on large graphical models, i.e. large systems of simple probabilistic units linked by a complex network of interactions, is a classical topic in machine learning. Such systems are also an active research topic in the field of statistical physics.
The main interaction between statistical physics and machine learning has so far been in the area of analysing data sets without explicit temporal structure. Here methods of equilibrium statistical physics, developed for studying Boltzmann distributions on networks of nodes with e.g. pairwise interactions, are closely related to graphical model inference techniques; accordingly there has been much crossfertilization leading to both conceptual insights and more efficient algorithms. Models can be learned from recorded experimental or other empirical data, but even when samples come from e.g. a time series this aspect of the data is typically ignored.
More recently, interest has shifted towards dynamical models. This shift has occurred for two main reasons:
(a) Most of the interesting systems for which statistical analysis techniques are required, e.g. networks of biological neurons, gene regulatory networks, proteinprotein interaction networks, stock markets, exhibit very rich temporal or spatiotemporal dynamics; if this is ignored by focusing on stationary distributions alone this can lead to the loss of a significant amount of interesting information and possibly even qualitatively wrong conclusions.
(b) Current technological breakthroughs in collecting data from the complex systems referred to above are yielding ever increasing temporal resolution. This in turn allows in depth analyses of the fundamental temporal aspects of the function of the system, if combined with strong theoretical methods. It is widely accepted that these dynamical aspects are crucial for understanding the function of biological and financial systems, warranting the development of techniques for studying them.
In the past, the fields of machine learning and statistical physics have crossfertilised each other significantly. E.g. the establishment of the relation between loopy belief propagation, message passing algorithms and the Bethe free energy formulation has stimulated a large amount of research in approximation techniques for inference and the corresponding equilibrium analysis of disordered systems in statistical physics.
It is the goal of the proposed workshop to bring together researchers from the fields of machine learning and statistical physics in order to discuss the new challenges originating from dynamical data. Such data are modelled using a variety of approaches such as dynamic belief networks, continuous time analogues of these – as often used for disordered spin systems in statistical physics –, coupled stochastic differential equations for continuous random variables etc. The workshop will provide a forum for exploring possible synergies between the inference and learning approaches developed for the various models. The experience from joint advances in the equilibrium domain suggests that there is much unexplored scope for progress on dynamical data.
Possible topics to be addressed will be:
Inference on state dynamics:
 efficient approximation of dynamics on a given network, filtering, smoothing
 inference with hidden nodes
 existing methods including dynamical belief propagation & expectation propagation, variational approximations, meanfield and Plefka approximations; relations between these, advantages, drawbacks
 alternative approaches
Learning model/network parameters:
 with/without hidden nodes
Learning network structure:
 going beyond correlation information
Abstracts of invited talks
Jose Bento: Learning Stochastic Differential Equations – Fundamental limits and efficient algorithms
Models based on stochastic differential equations (SDEs) play a crucial role in several domains of science and technology, ranging from chemistry to finance.
In this talk I consider the problem of learning the drift coefficient of a pdimensional stochastic differential equation from a sample path of length T. I assume that the drift is parametrized by a high dimensional vector, and study the support recovery problem in the case where p is allowed to grow with T.
In particular, I describe a general lower bound on the samplecomplexity T by using a characterization of mutual information as time integral of conditional variance, due to Kadota, Zakai, and Ziv. For linear stochastic differential equations, the drift coefficient is parametrized by a p by p matrix which describes which degrees of freedom interact under the dynamics. In this case, I analyze an L1regularized leastsquares estimator and describe an upper bound on T that nearly matches the lower bound on specific classes of sparse matrices.
I describe how this same algorithm can be used to learn nonlinear SDEs and in addition show by means of a numerical experiment why one should expect the samplecomplexity to be of the same order as that for linear SDEs.
Alfredo Braunstein: Bayesian inference of cascades on networks
We present a method based on Belief Propagation to study a series of inference problems on discrete dynamical cascade models based on partial and/or noisy observations of the cascades. The problems include the identification of the source, the discovery of undetected infected nodes, prediction of features of the future evolution, and the inference of the supporting network.
Ramon Grima: Exact and approximate solutions for spatial stochastic models of chemical systems
Stochastic effects in chemical reaction systems have been mostly studied via the chemical master equation, a nonspatial discrete stochastic formulation of chemical kinetics which assumes wellmixing and pointlike interactions between molecules. These assumptions are in direct contrast with what experiments tells us about the nature of the intracellular environment, namely that diffusion plays a fundamental role in intracellular dynamics and that the environment itself is highly nondilute (or crowded). I will here describe our recent work on obtaining (i) exact expressions for the solution of the reactiondiffusion master equation (RDME) and its crowded counterpart (cRDME) in equilibrium conditions and (ii) approximate expressions for the moments in nonequilibrium conditions. The solutions portray an emerging picture of the combined influence of diffusion and crowding on the stochastic properties of chemical reaction networks.
Jakob Macke: Correlations and signatures of criticality in neural population models
Largescale recording methods make it possible to measure the statistics of neural population activity, and thereby to gain insights into the principles that govern the collective activity of neural ensembles. One hypothesis that has emerged from this approach is that neural populations are poised at a ‘thermodynamic critical point’, and that this has important functional consequences (Tkacik et al 2014). Support for this hypothesis has come from studies that computed the specific heat, a measure of global population statistics, for groups of neurons subsampled from population recordings. These studies have found two effects which—in physical systems—indicate a critical point: First, specific heat diverges with population size N. Second, when manipulating population statistics by introducing a ’temperature’ in analogy to statistical mechanics, the maximum heat moves towards unittemperature for large populations.
What mechanisms can explain these observations? We show that both effects arise in a simple simulation of retinal population activity. They robustly appear across a range of parameters including biologically implausible ones, and can be understood analytically in simple models. The specific heat grows with N whenever the (average) correlation is independent of N, which is always true when uniformly subsampling a large, correlated population. For weakly correlated populations, the rate of divergence of the specific heat is proportional to the correlation strength. Thus, if retinal population codes were optimized to maximize specific heat, then this would predict that they seek to increase correlations. This is incongruent with theories of efficient coding that make the opposite prediction. We find criticality in a simple and parsimonious model of retinal processing, and without the need for finetuning or adaptation. This suggests that signatures of criticality might not require an optimized coding strategy, but rather arise as consequence of subsampling a stimulusdriven neural population (Aitchison et al 2014).
Andrea Montanari: Informationtheoretic bounds on learning network dynamics
How long should we observe the trajectory of a system before being able to characterize its underlying network dynamics? I will present a brief review of informationtheoretic tools to establish lower bounds on the required length of observation. I will illustrate the use of these tools with a few examples: linear and nonlinear stochastic differential equations, dynamical Bayesian networks
and so on. For each of these examples, I will discuss whether the ultimate information limit has been achieved by practical algorithms or not.
Graham Taylor: Learning Multiscale Temporal Dynamics with Recurrent Neural Networks
The last three years have seen an explosion of activity studying recurrent neural networks (RNNs), a generalization of feedforward neural networks which can map sequences to sequences. Training RNNs using backpropagation through time can be difficult, and was thought up until recently to be hopeless due to vanishing and exploding gradients used in training. Recent advances in optimization methods and architectures have led to impressive results in modeling speech, handwriting and language. Applications to other areas are emerging. In this talk, I will review some recent progress on RNNs and discuss our work on extending and improving the Clockwork RNN (Koutnick et al.), a simple yet powerful model that partitions its hidden units to model specific temporal scales. Our “Dense clockworks” are a shiftinvariant form of the architecture which which we show to be more efficient and effective than their predecessor. I will also describe a recent collaboration with Google in which we apply Dense clockworks to authenticating mobile phone users based on the movement of the device as captured by the accelerometer and gyroscope.
Fri 6:00 a.m.  6:30 a.m.
[iCal]

Learning Stochastic Differential Equations
(Talk)

José Bento 
Fri 7:30 a.m.  8:00 a.m.
[iCal]

Correlations and Signatures of Criticality in Neural Population Models
(Talk)

Jakob H Macke 
Fri 8:00 a.m.  8:45 a.m.
[iCal]

Spotlight

Caterina De Bacco, Ludovica BachschmidRomano, Barbara Bravi 
Fri 8:00 a.m.  8:45 a.m.
[iCal]

Spotlight Part II
(Spotlight )

Bhaswar B Bhattacharya, Kenji Doya, Alex Gibberd, Sakya Dasgupta, Daniel Soudry 
Fri 11:30 a.m.  12:00 p.m.
[iCal]

Informationtheoretic bounds on learning network dynamics
(Talk)

Andrea Montanari 
Fri 12:00 p.m.  12:30 p.m.
[iCal]

Bayesian Inference of Cascades on Networks
(Talk)

Alfredo Braunstein 
Fri 2:00 p.m.  2:30 p.m.
[iCal]

Exact and approximate solutions for spatial stochastic models of chemical system
(Talk)

Ramon Grima 
Fri 2:30 p.m.  3:00 p.m.
[iCal]

Learning Multiscale Temporal Dynamics with Recurrent Neural Networks
(Talk)

Graham W Taylor 
Author Information
Manfred Opper (TU Berlin)
Yasser Roudi (Kavli Inst For Systems Neuriscience)
Peter Sollich (King's College London)
More from the Same Authors

2017 Poster: Perturbative Black Box Variational Inference »
Robert Bamler · Cheng Zhang · Manfred Opper · Stephan Mandt 
2014 Poster: Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data »
Florian Stimberg · Andreas Ruttor · Manfred Opper 
2014 Spotlight: Poisson Process Jumping between an Unknown Number of Rates: Application to Neural Spike Data »
Florian Stimberg · Andreas Ruttor · Manfred Opper 
2014 Poster: Optimal Neural Codes for Control and Estimation »
Alex K Susemihl · Ron Meir · Manfred Opper 
2013 Poster: Approximate inference in latent GaussianMarkov models from continuous time observations »
Botond Cseke · Manfred Opper · Guido Sanguinetti 
2013 Spotlight: Approximate inference in latent GaussianMarkov models from continuous time observations »
Botond Cseke · Manfred Opper · Guido Sanguinetti 
2013 Poster: Approximate Gaussian process inference for the drift function in stochastic differential equations »
Andreas Ruttor · Philipp Batz · Manfred Opper 
2012 Poster: Learning curves for multitask Gaussian process regression »
Peter Sollich · Simon R Ashton 
2011 Poster: Analytical Results for the Error in Filtering of Gaussian Processes »
Alex K Susemihl · Ron Meir · Manfred Opper 
2010 Spotlight: Exact learning curves for Gaussian process regression on large random graphs »
Matthew J Urry · Peter Sollich 
2010 Poster: Exact learning curves for Gaussian process regression on large random graphs »
Matthew J Urry · Peter Sollich 
2009 Poster: Kernels and learning curves for Gaussian process regression on random graphs »
Peter Sollich · Matthew J Urry · Camille Coti 
2009 Spotlight: Kernels and learning curves for Gaussian process regression on random graphs »
Peter Sollich · Matthew J Urry · Camille Coti