Track: Spotlights

COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Markus Weimer · Alexandros Karatzoglou · Quoc V Le · Alexander Smola

In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We use structured output prediction to optimize for specific non-uniform ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks.

Density Estimation under Independent Similarly Distributed Sampling Assumptions

Tony Jebara · Yingbo Song · Kapil Thadani

A method is proposed for semiparametric estimation where parametric and nonparametric criteria are exploited in density estimation and unsupervised learning. This is accomplished by making sampling assumptions on a dataset that smoothly interpolate between the extreme of independently distributed (or {\em id}) sample data (as in nonparametric kernel density estimators) to the extreme of independent {\em identically} distributed (or {\em iid}) sample data. This article makes independent {\em similarly} distributed (or {\em isd}) sampling assumptions and interpolates between these two using a scalar parameter. The parameter controls a Bhattacharyya affinity penalty between pairs of distributions on samples. Surprisingly, the {\em isd} method maintains certain consistency and unimodality properties akin to maximum likelihood estimation. The proposed {\em isd} scheme is an alternative for handling nonstationarity in data without making drastic hidden variable assumptions which often make estimation difficult and laden with local optima. Experiments in density estimation on a variety of datasets confirm the superiority of {\em isd} over {\em iid} estimation, {\em id} estimation and mixture modeling.

Heterogeneous Component Analysis

Shigeyuki Oba · Motoaki Kawanabe · Klaus-Robert Müller · Shin Ishii

In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noise levels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithms that implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specic components within each block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept.

Hidden Common Cause Relations in Relational Learning

Ricardo Silva · Wei Chu · Zoubin Ghahramani

When predicting class labels for objects within a relational database, it is often helpful to consider a model for relationships: this allows for information between class labels to be shared and to improve prediction performance. However, there are different ways by which objects can be related within a relational database. One traditional way corresponds to a Markov network structure: each existing relation is represented by an undirected edge. This encodes that, conditioned on input features, each object label is independent of other object labels given its neighbors in the graph. However, there is no reason why Markov networks should be the only representation of choice for symmetric dependence structures. Here we discuss the case when relationships are postulated to exist due to hidden common causes. We discuss how the resulting graphical model differs from Markov networks, and how it describes different types of real-world relational processes. A Bayesian nonparametric classification model is built upon this graphical representation and evaluated with several empirical studies.

Infinite State Bayes-Nets for Structured Domains

Max Welling · Ian Porteous · Evgeniy Bart

A general modeling framework is proposed that unifies nonparametric-Bayesian models, topic-models and Bayesian networks. This class of infinite state Bayes nets (ISBN) can be viewed as directed networks of hierarchical Dirichlet processes' (HDPs) where the domain of the variables can be structured (e.g. words in documents or features in images). To model the structure and to share groups between them we usecascades' of Dirichlet priors. We show that collapsed Gibbs sampling can be done efficiently in these models by leveraging the structure of the Bayes net and using the forward-filtering-backward-sampling algorithm for junction trees. Existing models, such as nested-DP, Pachinko allocation, mixed membership models etc. are described as ISBNs. Two experiments have been implemented to illustrate these ideas.

Semi-Supervised Multitask Learning

Qiuhua Liu · Xuejun Liao · Lawrence Carin

A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a soft-sharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL.

SpAM: Sparse Additive Models

Pradeep Ravikumar · Han Liu · John Lafferty · Larry Wasserman

We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size. A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.

TrueSkill Through Time: Revisiting the History of Chess

Pierre Dangauthier · Ralf Herbrich · Tom Minka · Thore K Graepel

We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Based on these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players' lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player's ability to force a draw provides significantly better predictive power.

Main Navigation

Session

Spotlights

Stefan Schaal