Skip to yearly menu bar Skip to main content

Workshop: Table Representation Learning Workshop

Tree-Regularized Tabular Embeddings

Xuan Li · Yun Wang · Bo Li

Keywords: [ Regularization ] [ Deep Neural Networks ] [ tabular ] [ Supervised Pretraining ] [ Representation Learning ]


Tabular neural network (NN) has attracted remarkable attentions and its recent advances have gradually narrowed the performance gap with respect to tree-based models on many public datasets. While the mainstream focus on calibrating NN to fit tabular data, we emphasize the importance of homogeneous embeddings and alternately concentrate on regularizing tabular inputs through supervised pretraining. Specifically, we extend a recent work named DeepTLF, and utilize the structure of pretrained tree ensembles to transform raw variables into a single vector (T2V), or an array of tokens (T2T). Without loss of space efficiency, these binarized embeddings can be directly consumed by canonical tabular NN with full-connected or attention-based building blocks. Through quantitative experiments on 88 OpenML datasets with binary classification task, we validated that the proposed tree-regularized representations not only taper the difference with respect to tree-based models, but also achieve on-par and better performance when compared with advanced NN models. Most importantly, it possesses better robustness and can be easily scaled and generalized as standalone encoder for tabular modality.

Chat is not available.