Efficient Scaling of Transformer Architectures in Value-Based Deep RL
Abstract
Large-scale models such as ChatGPT, DALL-E, and others are increasingly integrated into daily life and critical domains like healthcare and industry. Their success stems from the innovations of deep neural networks, but this performance often comes at the cost of enormous model sizes and massive training datasets. Scaling these architectures introduces significant challenges, particularly in deep reinforcement learning (RL), where training instabilities and pathological behaviors are common. To address these issues, prior work has explored approaches such as incorporating Mixture-of-Experts (MoE) layers or reframing regression tasks as classification problems. In this paper, we first examine existing techniques that mitigate these challenges, including the use of categorical cross-entropy loss on a suite of language tasks. We then hypothesize that ReLoRA, a low-rank update training method, can further improve the scalability of Transformer-based value-oriented deep RL models.