Timezone: »

Distribution-Calibrated Hierarchical Classification
Ofer Dekel

Mon Dec 07 07:00 PM -- 11:59 PM (PST) @

While many advances have already been made on the topic of hierarchical classi- fication learning, we take a step back and examine how a hierarchical classifica- tion problem should be formally defined. We pay particular attention to the fact that many arbitrary decisions go into the design of the the label taxonomy that is provided with the training data, and that this taxonomy is often unbalanced. We correct this problem by using the data distribution to calibrate the hierarchical classification loss function. This distribution-based correction must be done with care, to avoid introducing unmanagable statstical dependencies into the learning problem. This leads us off the beaten path of binomial-type estimation and into the uncharted waters of geometric-type estimation. We present a new calibrated definition of statistical risk for hierarchical classification, an unbiased geometric estimator for this risk, and a new algorithmic reduction from hierarchical classifi- cation to cost-sensitive classification.

Author Information

Ofer Dekel (Microsoft Research)

More from the Same Authors