Timezone: »
Poster
A no-regret generalization of hierarchical softmax to extreme multi-label classification
Marek Wydmuch · Kalina Jasinska-Kobus · Mikhail Kuznetsov · Róbert Busa-Fekete · Krzysztof Dembczynski
Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.
Author Information
Marek Wydmuch (Poznan University of Technology)
Kalina Jasinska-Kobus (Poznan University of Technology, Allegro.pl)
Mikhail Kuznetsov (Yahoo! Research)
Róbert Busa-Fekete (Yahoo! Research)
Krzysztof Dembczynski (Poznan University of Technology)
More from the Same Authors
-
2021 : Population Level Privacy Leakage in Binary Classification wtih Label Noise »
Róbert Busa-Fekete · Andres Munoz · Umar Syed · Sergei Vassilvitskii -
2021 : On the Pitfalls of Label Differential Privacy »
Andres Munoz · Róbert Busa-Fekete · Umar Syed · Sergei Vassilvitskii -
2021 : Population Level Privacy Leakage in Binary Classification wtih Label Noise »
Róbert Busa-Fekete · Andres Munoz · Umar Syed · Sergei Vassilvitskii -
2021 Poster: Identity testing for Mallows model »
Róbert Busa-Fekete · Dimitris Fotakis · Balazs Szorenyi · Emmanouil Zampetakis -
2021 Poster: Private and Non-private Uniformity Testing for Ranking Data »
Róbert Busa-Fekete · Dimitris Fotakis · Emmanouil Zampetakis -
2021 : On the Pitfalls of Label Differential Privacy »
Andres Munoz · Róbert Busa-Fekete · Umar Syed · Sergei Vassilvitskii -
2020 : Real World RL with Vowpal Wabbit: Beyond Contextual Bandits »
John Langford · Marek Wydmuch · Maryam Majzoubi · Adith Swaminathan · · Dylan Foster · Paul Mineiro -
2018 Poster: Distributed Stochastic Optimization via Adaptive SGD »
Ashok Cutkosky · Róbert Busa-Fekete -
2017 Workshop: Extreme Classification: Multi-class & Multi-label Learning in Extremely Large Label Spaces »
Manik Varma · Marius Kloft · Krzysztof Dembczynski -
2015 Poster: Online F-Measure Optimization »
Róbert Busa-Fekete · Balázs Szörényi · Krzysztof Dembczynski · Eyke Hüllermeier -
2015 Poster: Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach »
Balázs Szörényi · Róbert Busa-Fekete · Adil Paul · Eyke Hüllermeier -
2011 Poster: An Exact Algorithm for F-Measure Maximization »
Krzysztof Dembczynski · Willem Waegeman · Weiwei Cheng · Eyke Hullermeier