Timezone: »
We empirically investigate how pre-training on data of different modalities, such as language and vision, affects fine-tuning of Transformer-based models to Mujoco offline reinforcement learning tasks. Analysis of the internal representation reveals that the pre-trained Transformers acquire largely different representations before and after pre-training, but acquire less information of data in fine-tuning than the randomly initialized one. A closer look at the parameter changes of the pre-trained Transformers reveals that their parameters do not change that much and that the bad performance of the model pre-trained with image data could partially come from large gradients and gradient clipping. To study what information the Transformer pre-trained with language data utilizes, we fine-tune this model with no context provided, finding that the model learns efficiently even without context information. Subsequent follow-up analysis supports the hypothesis that pre-training with language data is likely to make the Transformer get context-like information and utilize it to solve the downstream task.
Author Information
Shiro Takagi (Independent Researcher)
I am an independent researcher on intelligence. My long-term research goal is to create an artificial researcher. I am interested in symbolic fluency, memory, and autonomy.
More from the Same Authors
-
2022 : Thoughts on the Applicability of Machine Learning to Scientific Discovery and Possible Future Research Directions (Perspective) »
Shiro Takagi -
2022 : Empirical Study on Optimizer Selection for Out-of-Distribution Generalization »
Hiroki Naganuma · Kartik Ahuja · Ioannis Mitliagkas · Shiro Takagi · Tetsuya Motokawa · Rio Yokota · Kohta Ishikawa · Ikuro Sato -
2022 : Managing the Whole Research Process on GitHub »
Shiro Takagi -
2022 : Separation of Research Data from Its Presentation »
Shiro Takagi -
2022 : Managing the Whole Research Process on GitHub »
Shiro Takagi