Spotlight
Collapsed Variational Inference for HDP
Yee Whye Teh · Kenichi Kurihara · Max Welling
A wide variety of Dirichlet-multinomial `topic' models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational LDA (CV-LDA)\cite{TehNewmanWelling06}, did not deal with model selection nor did it include inference for the hyper-parameters. We generalize their technique to the HDP to address these issues. The result is a collapsed variational inference technique that can add topics indefinitely, for instance through split and merge heuristics, while the algorithm will automatically remove clusters which are not supported by the data. Experiments show a very significant improvement in accuracy relative to CV-LDA.