Timezone: »

Advances in Modeling and Learning Interactions from Complex Data
Gautam Dasarathy · Mladen Kolar · Richard Baraniuk

Fri Dec 08 08:00 AM -- 06:30 PM (PST) @ 102 A+B
Event URL: https://sites.google.com/view/nips2017interactions/ »

Whether it is biological networks of proteins and genes or technological ones like sensor networks and the Internet, we are surrounded today by complex systems composed of entities interacting with and affecting each other. An urgent need has therefore emerged for developing novel techniques for modeling, learning, and conducting inference in such networked systems. Consequently, we have seen progress from a variety of disciplines in both fundamental methodology and in applications of such methods to practical problems. However, much work remains to be done, and a unifying and principled framework for dealing with these problems remains elusive. This workshop aims to bring together theoreticians and practitioners in order to both chart out recent advances and to discuss new directions in understanding interactions in large and complex systems. NIPS, with its attendance by a broad and cross-disciplinary set of researchers offers the ideal venue for this exchange of ideas.

The workshop will feature a mix of contributed talks, contributed posters, and invited talks by leading researchers from diverse backgrounds working in these areas. We will also have a specific segment of the schedule reserved for the presentation of open problems, and will have plenty of time for discussions where we will explicitly look to spark off collaborations amongst the attendees.

We encourage submissions in a variety of topics including, but not limited to:
* Computationally and statistically efficient techniques for learning graphical models from data including convex, greedy, and active approaches.
* New probabilistic models of interacting systems including nonparametric and exponential family graphical models.
* Community detection algorithms including semi-supervised and adaptive approaches.
* Techniques for modeling and learning causal relationships from data.
* Bayesian techniques for modeling complex data and causal relationships.
* Kernel methods for directed and undirected graphical models.
* Applications of these methods in various areas like sensor networks, computer networks, social networks, and biological networks like phylogenetic trees and graphs.

Successful submissions will emphasize the role of statistical and computational learning to the problem at hand. The author(s) of these submissions will be invited to present their work as either a poster or as a contributed talk. Alongside these, we also solicit submissions of open problems that go with the theme of the workshop. The author(s) of the selected open problems will be able to present the problem to the attendees and solicit feedback/collaborations.

Fri 8:00 a.m. - 8:05 a.m.
Opening Remarks
Gautam Dasarathy
Fri 8:05 a.m. - 8:40 a.m.

We consider the problem of recovering a hidden community of size K from a graph where edges between members of the community have label X drawn i.i.d. according to P and all other edges have labels drawn i.i.d. according to Q. The information limits for this problem were characterized by Hajek-Wu-Xu in 2016 in terms of the KL-divergence between P and Q. We complement their work by showing that for a broad class of distributions P and Q the computational difficulty is also determined by the KL-divergence. We additionally show how to reduce general P and Q to the case P = Ber(p) and Q = Ber(q) and vice versa, giving a direct computational equivalence (up to polynomial time).

Guy Bresler
Fri 8:40 a.m. - 9:00 a.m.

We propose a dynamic edge exchangeable network model that can capture sparse connections observed in real temporal networks, in contrast to existing models which are dense. The model achieved superior link prediction accuracy when compared to a dynamic variant of the blockmodel, and is able to extract interpretable time-varying community structures. In addition to sparsity, the model accounts for the effect of social influence on vertices’ future behaviours. Compared to the dynamic blockmodels, our model has a smaller latent space. The compact latent space requires a smaller number of parameters to be estimated in variational inference and results in a computationally friendly inference algorithm.

Yin Cheng Ng
Fri 9:00 a.m. - 9:20 a.m.

In this paper, we propose an optimization-based sparse learning approach to iden- tify the set of most influential reactions in a chemical reaction network. This reduced set of reactions is then employed to construct a reduced chemical reaction mechanism, which is relevant to chemical interaction network modeling. The problem of identifying influential reactions is first formulated as a mixed-integer quadratic program, and then a relaxation method is leveraged to reduce the compu- tational complexity of our approach. Qualitative and quantitative validation of the sparse encoding approach demonstrates that the model captures important network structural properties with moderate computational load.

Fri 9:20 a.m. - 10:00 a.m.
Poster Spotlights
Fri 10:00 a.m. - 11:00 a.m.
Coffee Break + Posters (Break/Poster Setup)
Fri 11:00 a.m. - 11:35 a.m.

We study maximum likelihood estimation for exponential families that are multivariate totally positive of order two (MTP2). Such distributions appear in the context of ferromagnetism in the Ising model and various latent models, as for example Brownian motion tree models used in phylogenetics. We show that maximum likelihood estimation for MTP2 exponential families is a convex optimization problem. For quadratic exponential families such as Ising models and Gaussian graphical models, we show that MTP2 implies sparsity of the underlying graph without the need of a tuning parameter. In addition, we characterize a subgraph and a supergraph of Gaussian graphical models under MTP2. Moreover, we show that the MLE always exists even in the high-dimensional setting. These properties make MTP2 constraints an intriguing alternative to methods for learning sparse graphical models such as the graphical lasso.

Caroline Uhler
Fri 11:35 a.m. - 11:55 a.m.

Real world networks often have nodes belonging to multiple communities. We consider the detection of overlapping communities under the popular Mixed Membership Stochastic Blockmodel (MMSB). Using the inherent geometry of this model, we link the inference of overlapping communities to the problem of finding corners in a noisy rotated and scaled simplex, for which consistent algorithms exist. We use this as a building block for our algorithm to infer the community memberships of each node, and prove its consistency. As a byproduct of our analysis, we derive sharp row-wise eigenvector deviation bounds, and provide a cleaning step that improves the performance drastically for sparse networks. We also propose both necessary and sufficient conditions for identifiability of the model, while existing methods typically present sufficient conditions. The empirical performance of our method is shown using simulated and real datasets scaling up to 100,000 nodes.

Xueyu Mao
Fri 12:00 p.m. - 2:00 p.m.
Lunch Break
Fri 2:00 p.m. - 2:35 p.m.

Discovering causal relationships from data is a challenging problem that is exacerbated when some of the variables of interests are latent. In this talk, we discuss the problem of learning the support of transition matrix between random processes in a Vector Autoregressive (VAR) model from samples when a subset of the processes are latent. It is well known that ignoring the effect of the latent processes may lead to very different estimates of the influences even among observed processes. We are not only interested in identifying the influences among the observed processes, but also aim at learning those between the latent ones, and those from the latent to the observed ones. We show that the support of transition matrix among the observed processes and lengths of all latent paths between any two observed processes can be identified successfully under some conditions on the VAR model. Our results apply to both non-Gaussian and Gaussian cases, and experimental results on various synthetic and real-world datasets validate our theoretical findings.

Negar Kiyavash
Fri 2:35 p.m. - 2:55 p.m.
Learning High-Dimensional DAGs: Provable Statistical Guarantees and Scalable Approximation (Contributed Talk)
Fri 4:00 p.m. - 4:35 p.m.

The exponential family is one of the most powerful and widely used classes of models in statistics. A method was recently developed to fit this model when the natural parameter and sufficient statistic are infinite dimensional, using a score matching approach. The infinite exponential family is a natural generalisation of the finite case, much like the Gaussian and Dirichlet processes generalise their respective finite modfels. In this talk, I'll describe two recent results which make this model more applicable in practice, by reducing the computational burden and improving performance for high-dimensional data. The firsrt is a Nytsrom-like approximation to the full solution. We prove that this approximate solution has the same consistency and convergence rates as the full-rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. The second result is a generalisation of the model family to the conditional case, again with consistency guarantees. In experiments, the conditional model generally outperforms a competing approach with consistency guarantees, and is competitive with a deep conditional density model on datasets that exhibit abrupt transitions and heteroscedasticity.

Arthur Gretton
Fri 4:35 p.m. - 4:55 p.m.

Non-parametric multivariate density estimation faces strong statistical and computational bottlenecks, and the more practical approaches impose near-parametric assumptions on the form of the density functions. In this paper, we leverage recent developments to propose a class of non-parametric models which have very attractive computational and statistical properties.

Arun Suggala
Fri 4:55 p.m. - 5:30 p.m.

Reconstructing evolutionary histories is a basic step in much biological discovery, as well as in historical linguistics and other domains. Inference methods based on mathematical models of evolution have been used to make substantial advances, including in understanding the early origins of life, to predicting protein structures and functions, and to addressing questions such as "Where did the Indo-European languages begin?" In this talk, I will describe the current state of the art in phylogeny estimation in these domains, what is understood from a mathematical perspective, and identify fascinating open problems where novel mathematical research - drawing from graph theory, algorithms, and probability theory - is needed. This talk will be accessible to mathematicians, computer scientists, and probabilists, and does not require any knowledge in biology.

Tandy Warnow

Author Information

Gautam Dasarathy (Rice University)
Mladen Kolar (University of Chicago)
Richard Baraniuk (Rice University)

More from the Same Authors