Timezone: »
When software developers modify one or more files in a large code base, they must also identify and update other related files. Many file dependencies can be detected by mining the development history of the code base: in essence, groups of related files are revealed by the logs of previous workflows. From data of this form, we show how to detect dependent files by solving a problem in binary matrix completion. We explore different latent variable models (LVMs) for this problem, including Bernoulli mixture models, exponential family PCA, restricted Boltzmann machines, and fully Bayesian approaches. We evaluate these models on the development histories of three large, open-source software systems: Mozilla Firefox, Eclipse Subversive, and Gimp. In all of these applications, we find that LVMs improve the performance of related file prediction over current leading methods.
Author Information
Diane Hu (UC San Diego)
Laurens van der Maaten (Facebook AI Research)
Youngmin Cho (University of California, San Diego)
Lawrence Saul (Flatiron Institute)
Sorin Lerner (UC San Diego)
More from the Same Authors
-
2022 : Bias Amplification in Image Classification »
Melissa Hall · Laurens van der Maaten · Laura Gustafson · Maxwell Jones · Aaron Adcock -
2023 Poster: GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition »
Vikram V. Ramaswamy · Sing Yu Lin · Dora Zhao · Aaron Adcock · Laurens van der Maaten · Deepti Ghadiyaram · Olga Russakovsky -
2021 Poster: An online passive-aggressive algorithm for difference-of-squares classification »
Lawrence Saul -
2020 Session: Orals & Spotlights Track 01: Representation/Relational »
Laurens van der Maaten · Fei Sha -
2012 Poster: Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning »
Matthew F Der · Lawrence Saul -
2011 Poster: Maximum Covariance Unfolding : Manifold Learning for Bimodal Data »
Vijay Mahadevan · Chi Wah Wong · Jose Costa Pereira · Tom Liu · Nuno Vasconcelos · Lawrence Saul -
2010 Workshop: Challenges of Data Visualization »
Barbara Hammer · Laurens van der Maaten · Fei Sha · Alexander Smola -
2010 Talk: Manifold Learning »
Lawrence Saul -
2010 Poster: On Herding and the Perceptron Cycling Theorem »
Andrew E Gelfand · Yutian Chen · Laurens van der Maaten · Max Welling -
2009 Poster: Kernel Methods for Deep Learning »
Youngmin Cho · Lawrence Saul -
2008 Demonstration: Visualizing NIPS Cooperations using Multiple Maps t-SNE »
Laurens van der Maaten · Geoffrey E Hinton -
2006 Poster: Large Margin Gaussian Mixture Models for Automatic Speech Recognition »
Fei Sha · Lawrence Saul -
2006 Talk: Large Margin Gaussian Mixture Models for Automatic Speech Recognition »
Fei Sha · Lawrence Saul -
2006 Poster: Graph Regularization for Maximum Variance Unfolding with an Application to Sensor Localization »
Kilian Q Weinberger · Fei Sha · Qihui Zhu · Lawrence Saul