The Human Leukocyte Antigen (HLA) complex plays a crucial role in adaptive immune responses for cancer immunology. Due to the complex interactions governing the binding process of peptides in the HLA surface and the intrinsic polymorphic nature of the HLA complex, one of the main bottlenecks for cancer vaccine design is the accurate prediction of binding epitopes for specific-alleles. Data-driven approaches using binding experiments' information have shown to be effective for high-throughput screening of candidates, instead of expensive docking methods. However, there is still no consensus on how to most effectively represent amino acid sequences and model long interactions patterns present in these complexes. Recently, attention-based models have been explored to improve this task, allowing for higher flexibility by introducing weaker inductive biases into the models, however carrying a critical trade-off between expressively and data-efficiency. We propose an allele-conditional attention mechanism for binding prediction and show how constraining attention between the HLA-context and peptide sequences improves performance, while requiring less parameters compared to standard transformer-like models. We thoroughly study the impact of different attention schemes and pooling methods on the task of binding affinity prediction and benchmark widely utilized deep learning architectures. In addition, we show that patterns in string representation space can also provide insights and encode information that correlates with the underlying spatial interactions between HLA class I and peptide amino acids, without any extra docking simulations.