Language Transformers (LaTs) have achieved state-of-the-art performance in a range of challenging protein modeling tasks including structure prediction, design, mutation effect prediction, and others. The lion's share of these improvements derive from exponential increases in the size and depth of these neural networks, which now routinely exceed billions of trainable parameters, rather than fundamental architectural innovations. This explosive growth in model size poses an obstacle to integration into design-build-test cycles, wherein models are iteratively evaluated, retrained, and improved throughout data collection. As a result, large LaTs do not meet the need for lightweight, rapid-to-train models that excel at problems with tight data-model feedback loops. Here, we present a small, 10 million-parameter BERT model with linearly scaling attention that can be trained from scratch on four Nvidia V100 GPUs in under a week and fine-tuned with full back-propagation in hours to days. We demonstrate that this model excels at two challenging active-learning problems, recombinant protein expression prediction and codon optimization, that require interfacing with experiments. Our approach highlights the size-cost tradeoff inherent to LaTs and demonstrates the utility of small, custom-designed models in practical settings.