Poster
Learning concept graphs from text with stick-breaking priors
America Chambers · Padhraic Smyth · Mark Steyvers

Tue Dec 7th 12:00 -- 12:00 AM @ None #None

We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents-a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (such as hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs.

Author Information

America Chambers (Univ. of California, Irvine)
Padhraic Smyth (University of California, Irvine)
Mark Steyvers (UC Irvine)

More from the Same Authors