Timezone: »
Tabular deep learning when $d \gg n$ by using an auxiliary knowledge graph
Camilo Ruiz · Hongyu Ren · Kexin Huang · Jure Leskovec
Event URL: https://openreview.net/forum?id=xbe-b4EpphA »
Machine learning models exhibit strong performance on datasets with abundant labeled samples. However, for tabular datasets with extremely high $d$-dimensional features but limited $n$ samples (i.e. $d \gg n$), machine learning models struggle to achieve strong performance. Here, our key insight is that even in tabular datasets with limited labeled data, input features often represent real-world entities about which there is abundant prior information which can be structured as an auxiliary knowledge graph (KG). For example, in a tabular medical dataset where every input feature is the amount of a gene in a patient's tumor and the label is the patient's survival, there is an auxiliary knowledge graph connecting gene names with drug, disease, and human anatomy nodes. We therefore propose PLATO, a machine learning model for tabular data with $d \gg n$ and an auxiliary KG with input features as nodes. PLATO uses a modified multilayer perceptron (MLP) to predict the output labels from the tabular data and the auxiliary KG with two components. First, PLATO predicts the parameters in the first layer of the MLP from the auxiliary KG. PLATO thereby reduces the number of trainable parameters in the MLP and integrates auxiliary information about the input features. Second, PLATO predicts different parameters in the first layer of the MLP for every input sample, thereby increasing the MLP’s representational capacity by allowing it to use different prior information for every input sample. Across 10 state-of-the-art baselines and 6 $d \gg n$ datasets, PLATO exceeds or matches the prior state-of-the-art, achieving performance improvements of up to 10.19%. Overall, PLATO uses an auxiliary KG about input features to enable tabular deep learning prediction when $d \gg n$.
Machine learning models exhibit strong performance on datasets with abundant labeled samples. However, for tabular datasets with extremely high $d$-dimensional features but limited $n$ samples (i.e. $d \gg n$), machine learning models struggle to achieve strong performance. Here, our key insight is that even in tabular datasets with limited labeled data, input features often represent real-world entities about which there is abundant prior information which can be structured as an auxiliary knowledge graph (KG). For example, in a tabular medical dataset where every input feature is the amount of a gene in a patient's tumor and the label is the patient's survival, there is an auxiliary knowledge graph connecting gene names with drug, disease, and human anatomy nodes. We therefore propose PLATO, a machine learning model for tabular data with $d \gg n$ and an auxiliary KG with input features as nodes. PLATO uses a modified multilayer perceptron (MLP) to predict the output labels from the tabular data and the auxiliary KG with two components. First, PLATO predicts the parameters in the first layer of the MLP from the auxiliary KG. PLATO thereby reduces the number of trainable parameters in the MLP and integrates auxiliary information about the input features. Second, PLATO predicts different parameters in the first layer of the MLP for every input sample, thereby increasing the MLP’s representational capacity by allowing it to use different prior information for every input sample. Across 10 state-of-the-art baselines and 6 $d \gg n$ datasets, PLATO exceeds or matches the prior state-of-the-art, achieving performance improvements of up to 10.19%. Overall, PLATO uses an auxiliary KG about input features to enable tabular deep learning prediction when $d \gg n$.
Author Information
Camilo Ruiz (Stanford University)
Hongyu Ren (Stanford University)
Kexin Huang (Stanford University)
Jure Leskovec (Stanford University/Pinterest)
More from the Same Authors
-
2020 : Poster #1 »
Hongyu Ren -
2021 : Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development »
Kexin Huang · Tianfan Fu · Wenhao Gao · Yue Zhao · Yusuf Roohani · Jure Leskovec · Connor Coley · Cao Xiao · Jimeng Sun · Marinka Zitnik -
2021 Spotlight: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 : OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Hongyu Ren · Maho Nakata · Yuxiao Dong · Jure Leskovec -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Adaptive Pseudo-labeling for Quantum Calculations »
Kexin Huang · Vishnu Sresht · Brajesh Rai -
2022 : Learning Controllable Adaptive Simulation for Multi-scale Physics »
Tailin Wu · Takashi Maruyama · Qingqing Zhao · Gordon Wetzstein · Jure Leskovec -
2022 : Learning Efficient Hybrid Particle-continuum Representations of Non-equilibrium N-body Systems »
Tailin Wu · Michael Sun · Hsuan-Gu Chou · Pranay Reddy Samala · Sithipont Cholsaipant · Sophia Kivelson · Jacqueline Yau · Rex Ying · E. Paulo Alves · Jure Leskovec · Frederico Fiuza -
2022 : AutoTransfer: AutoML with Knowledge Transfer - An Application to Graph Neural Networks »
Kaidi Cao · Jiaxuan You · Jiaju Liu · Jure Leskovec -
2022 : Efficient Automatic Machine Learning via Design Graphs »
Shirley Wu · Jiaxuan You · Jure Leskovec · Rex Ying -
2022 : Link-level Track: Intro »
Hongyu Ren -
2022 Competition: OGB-LSC 2022: A Large-Scale Challenge for ML on Graphs »
Weihua Hu · Matthias Fey · Hongyu Ren · Maho Nakata · Yuxiao Dong · Jure Leskovec -
2022 : Introduction to OGB-LSC »
Jure Leskovec -
2022 Poster: Inductive Logical Query Answering in Knowledge Graphs »
Michael Galkin · Zhaocheng Zhu · Hongyu Ren · Jian Tang -
2022 Poster: Deep Bidirectional Language-Knowledge Graph Pretraining »
Michihiro Yasunaga · Antoine Bosselut · Hongyu Ren · Xikun Zhang · Christopher D Manning · Percy Liang · Jure Leskovec -
2022 Poster: ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time »
Tailin Wu · Megan Tjandrasuwita · Zhengxuan Wu · Xuelin Yang · Kevin Liu · Rok Sosic · Jure Leskovec -
2022 Poster: Learning to Accelerate Partial Differential Equations via Latent Global Evolution »
Tailin Wu · Takashi Maruyama · Jure Leskovec -
2022 Poster: Few-shot Relational Reasoning via Connection Subgraph Pretraining »
Qian Huang · Hongyu Ren · Jure Leskovec -
2022 Poster: Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks »
Arian Jamasb · Ramon Viñas Torné · Eric Ma · Yuanqi Du · Charles Harris · Kexin Huang · Dominic Hall · Pietro Lió · Tom Blundell -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2021 Poster: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Poster: Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones »
Yushi Bai · Zhitao Ying · Hongyu Ren · Jure Leskovec -
2021 Poster: Neural Distance Embeddings for Biological Sequences »
Gabriele Corso · Zhitao Ying · Michal Pándy · Petar Veličković · Jure Leskovec · Pietro Liò -
2020 : Contributed Talk #3 »
Hongyu Ren -
2020 : Q&A #2 »
Heng Ji · Jure Leskovec · Jiajun Wu -
2020 : Invited Talk #4 »
Jure Leskovec -
2020 Poster: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Poster: Coresets for Robust Training of Deep Neural Networks against Noisy Labels »
Baharan Mirzasoleiman · Kaidi Cao · Jure Leskovec -
2020 Poster: Graph Information Bottleneck »
Tailin Wu · Hongyu Ren · Pan Li · Jure Leskovec -
2020 Spotlight: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Poster: Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning »
Pan Li · Yanbang Wang · Hongwei Wang · Jure Leskovec -
2020 Poster: Handling Missing Data with Graph Representation Learning »
Jiaxuan You · Xiaobai Ma · Yi Ding · Mykel J Kochenderfer · Jure Leskovec -
2020 Poster: Design Space for Graph Neural Networks »
Jiaxuan You · Zhitao Ying · Jure Leskovec -
2020 Poster: Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs »
Hongyu Ren · Jure Leskovec -
2020 Spotlight: Design Space for Graph Neural Networks »
Jiaxuan You · Zhitao Ying · Jure Leskovec -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Presentation and Discussion: Open Graph Benchmark »
Jure Leskovec -
2019 Workshop: Graph Representation Learning »
Will Hamilton · Rianne van den Berg · Michael Bronstein · Stefanie Jegelka · Thomas Kipf · Jure Leskovec · Renjie Liao · Yizhou Sun · Petar Veličković -
2019 Poster: Hyperbolic Graph Convolutional Neural Networks »
Ines Chami · Zhitao Ying · Christopher Ré · Jure Leskovec -
2019 Poster: G2SAT: Learning to Generate SAT Formulas »
Jiaxuan You · Haoze Wu · Clark Barrett · Raghuram Ramanujan · Jure Leskovec -
2019 Poster: GNNExplainer: Generating Explanations for Graph Neural Networks »
Zhitao Ying · Dylan Bourgeois · Jiaxuan You · Marinka Zitnik · Jure Leskovec -
2018 : Coffee Break and Poster Session I »
Pim de Haan · Bin Wang · Dequan Wang · Aadil Hayat · Ibrahim Sobh · Muhammad Asif Rana · Thibault Buhet · Nicholas Rhinehart · Arjun Sharma · Alex Bewley · Michael Kelly · Lionel Blondé · Ozgur S. Oguz · Vaibhav Viswanathan · Jeroen Vanbaar · Konrad Żołna · Negar Rostamzadeh · Rowan McAllister · Sanjay Thakur · Alexandros Kalousis · Chelsea Sidrane · Sujoy Paul · Daphne Chen · Michal Garmulewicz · Henryk Michalewski · Coline Devin · Hongyu Ren · Jiaming Song · Wen Sun · Hanzhang Hu · Wulong Liu · Emilie Wirbel -
2018 Poster: Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation »
Jiaxuan You · Bowen Liu · Zhitao Ying · Vijay Pande · Jure Leskovec -
2018 Poster: Multi-Agent Generative Adversarial Imitation Learning »
Jiaming Song · Hongyu Ren · Dorsa Sadigh · Stefano Ermon -
2018 Poster: Dynamic Network Model from Partial Observations »
Elahe Ghalebi · Baharan Mirzasoleiman · Radu Grosu · Jure Leskovec -
2018 Spotlight: Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation »
Jiaxuan You · Bowen Liu · Zhitao Ying · Vijay Pande · Jure Leskovec -
2018 Spotlight: Dynamic Network Model from Partial Observations »
Elahe Ghalebi · Baharan Mirzasoleiman · Radu Grosu · Jure Leskovec -
2018 Poster: Hierarchical Graph Representation Learning with Differentiable Pooling »
Zhitao Ying · Jiaxuan You · Christopher Morris · Xiang Ren · Will Hamilton · Jure Leskovec -
2018 Poster: Bias and Generalization in Deep Generative Models: An Empirical Study »
Shengjia Zhao · Hongyu Ren · Arianna Yuan · Jiaming Song · Noah Goodman · Stefano Ermon -
2018 Spotlight: Hierarchical Graph Representation Learning with Differentiable Pooling »
Zhitao Ying · Jiaxuan You · Christopher Morris · Xiang Ren · Will Hamilton · Jure Leskovec -
2018 Spotlight: Bias and Generalization in Deep Generative Models: An Empirical Study »
Shengjia Zhao · Hongyu Ren · Arianna Yuan · Jiaming Song · Noah Goodman · Stefano Ermon -
2018 Poster: Embedding Logical Queries on Knowledge Graphs »
Will Hamilton · Payal Bajaj · Marinka Zitnik · Dan Jurafsky · Jure Leskovec -
2017 : Jure Leskovec, Stanford »
Jure Leskovec -
2017 Poster: Inductive Representation Learning on Large Graphs »
Will Hamilton · Zhitao Ying · Jure Leskovec -
2016 Poster: Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making »
Himabindu Lakkaraju · Jure Leskovec -
2013 Workshop: Frontiers of Network Analysis: Methods, Models, and Applications »
Edo M Airoldi · David S Choi · Aaron Clauset · Khalid El-Arini · Jure Leskovec -
2013 Poster: Nonparametric Multi-group Membership Model for Dynamic Networks »
Myunghwan Kim · Jure Leskovec -
2012 Workshop: Social network and social media analysis: Methods, models and applications »
Edo M Airoldi · David S Choi · Khalid El-Arini · Jure Leskovec -
2012 Poster: Learning to Discover Social Circles in Ego Networks »
Julian J McAuley · Jure Leskovec -
2010 Workshop: Networks Across Disciplines: Theory and Applications »
Edo M Airoldi · Anna Goldenberg · Jure Leskovec · Quaid Morris -
2010 Oral: On the Convexity of Latent Social Network Inference »
Seth A Myers · Jure Leskovec -
2010 Poster: On the Convexity of Latent Social Network Inference »
Seth A Myers · Jure Leskovec -
2009 Workshop: Analyzing Networks and Learning With Graphs »
Edo M Airoldi · Jure Leskovec · Jon Kleinberg · Josh Tenenbaum