Timezone: »

Poster
Unsupervised Learning under Latent Label Shift
Manley Roberts · Pranav Mani · Saurabh Garg · Zachary Lipton

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #803
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where the label marginals $p_d(y)$ shift but the class conditionals $p(x|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|x)$ suffices to identify $p_d(y)$ and $p_d(y|x)$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|x)$; (ii) discretize the data by clustering examples in $p(d|x)$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|x)$ to compute $p_d(y|x) \; \forall d$. With semisynthetic experiments, we show that our algorithm can leverage domain information to improve upon competitiveunsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when data-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.

Author Information

Manley Roberts (Carnegie Mellon University)

Working on my Master's in Machine Learning at Carnegie Mellon, doing research in the [ACMI Lab](https://acmilab.org/) on distribution shift. I'll graduate in December 2022, and begin working in research at [Abacus.AI](abacus.ai) after that.

Pranav Mani (Carnegie Mellon University)

I am a Masters student in the Machine Learning Department at Carnegie Mellon University. I work with Professor. Zack Lipton on problems in Distribution Shift and Deep Learning. I am also interested in problems related to NLP and Causality. I have a conference paper at NeurIPS 2022 titled Unsupervised Learning Under Latent Label Shift. I will be graduating in Dec 2022.