Timezone: »

Structure-Inducing Pre-training
TestMatt TestMcDermott · Brendan Yap · Peter Szolovits · Marinka Zitnik

Language model pre-training and derived methods are incredibly impactful in machine learning.However, there remains considerable uncertainty on exactly why pre-training helps improve performance for fine-tuning tasks. This is especially true when attempting to adapt language-model pre-training to domains outside of natural language.Here, we analyze this problem by exploring how existing pre-training methods impose relational structure in their induced per-sample latent spaces---i.e., what constraints do pre-training methods impose on the distance or geometry between the pre-trained embeddings of two samples $\boldsymbol{x}_i$ and $\boldsymbol{x}_j$.Through a comprehensive review of existing pre-training methods, we find that this question remains open. This is true despite theoretical analyses demonstrating the importance of understanding this form of induced structure.Based on this review, we introduce a descriptive framework for pre-training that allows for a granular, comprehensive understanding of how relational structure can be induced. We present a theoretical analysis of this framework from first principles and establish a connection between the relational inductive bias of pre-training and fine-tuning performance. We also show how to use the framework to define new pre-training methods.We build upon these findings with empirical studies on benchmarks spanning 3 data modalities and ten fine-tuning tasks. These experiments validate our theoretical analyses, inform the design of novel pre-training methods, and establish consistent improvements over a compelling suite of baseline methods.