An empirical study of task and feature correlations in the reuse of pre-trained models
Abstract
Pre-trained neural networks are commonlyused and reused in the machine learning community.Alice trains a model for a particular task, and a part of her neural network is reused by Bob for a different task, often to great effect.To what can we ascribe Bob's success?This paper introduces an experimental setup through which factors contributing to Bob's empirical success could be studied in silico.As a result, we demonstrate that Bob might just be lucky: his task accuracy increases monotonically with the correlation between his task and Alice's.Even when Bob has provably uncorrelated tasks and input features from Alice's pre-trained network, he can achieve significantly better than random performancedue to Alice's choice of network and optimizer.When there is little correlation between tasks,only reusing lower pre-trained layers is preferable,and we hypothesize the converse:that the optimal number of retrained layersis indicative of task and feature correlation.Finally, we show in controlled real-world scenarios that Bob can effectively reuse Alice's pre-trained network if there are semantic correlations between his and Alice's task.