Ordered Diversity Sampling for Text
Ashish Tiwari · Mukul Singh · Ananya Singha · Arjun Radhakrishna
Abstract
The goal of diversity sampling is to select a representative subset of data in a way that maximizes the information contained in the subset while keeping its cardinality small. We introduce the ordered diversity sampling problem and present a novel and simple approach for generating ordered diverse samples for textual data that uses principal components on the embedding vectors. We compare our approach with existing approaches using a new metric that measures diversity in an ordered list of samples. We transform standard text classification benchmarks into benchmarks for ordered diversity sampling and show that prevailing approaches perform $6$\% to $61$\% worse than our method while also being more time inefficient.
Chat is not available.
Successful Page Load