Continual Learning and Out of Distribution Generalization in a Systematic Reasoning Task
Mustafa Abdool · Andrew Nam · James McClelland
Keywords:
transformers
out of distribution generalization
games
Deep Neural Networks
continual learning
abstract reasoning
systematic reasoning
Abstract
Humans have the remarkable ability to rapidly learn new problem solving strategies from a narrow range of examples and extend to examples out of the distribution (OOD) used in learning, but such generalization remains a challenge for neural networks. This seems especially important for learning new mathematical techniques, which apply to huge problem spaces (e.g. all real numbers). We explore this limitation by training neural networks on strategies for solving specified cells in $6\times6$ Sudoku puzzles using a novel curriculum of tasks that build upon each other. We train transformers sequentially on two preliminary tasks, then assess OOD generalization of a more complex solution strategy from a range of restricted training distributions. Baseline models master the training distribution, but fail to generalize to OOD data. However, we find that a combination of extensions is sufficient to support highly accurate and reliable OOD generalization. These results suggest directions for improving the robustness of larger transformer models under the highly imbalanced data distributions provided by natural data sets.
Chat is not available.
Successful Page Load