Timezone: »
The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. Furthermore, the partitioning scheme balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We also use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors; actually the speedup is almost linearly scalable with the number of multiprocessors available. The proposed partitioning scheme and data streaming can be easily ported to many other models in machine learning.
Author Information
Feng Yan (Facebook)
Ningyi XU (Microsoft Research Asia)
Yuan Qi (Purdue university)
More from the Same Authors
-
2016 Poster: Distributed Flexible Nonlinear Tensor Factorization »
Shandian Zhe · Kai Zhang · Pengyuan Wang · Kuang-chih Lee · Zenglin Xu · Yuan Qi · Zoubin Ghahramani -
2011 Poster: t-divergence Based Approximate Inference »
Nan Ding · S.V.N. Vishwanathan · Yuan Qi -
2011 Poster: EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning »
Yuan Qi · Feng Yan -
2010 Workshop: Machine Learning for Social Computing »
Zenglin Xu · Irwin King · Shenghuo Zhu · Yuan Qi · Rong Yan · John Yen