Distributed Element-Local Transformer for Scalable and Consistent Mesh-Based Modeling
Abstract
The transformer architecture has become the state-of-the-art across numerous domains, yet its application to large-scale, unstructured mesh-based scientific simulations remains a significant challenge. In this work, we propose a novel transformer-based methodology that operates directly on massive, high-fidelity meshes typical of leadership-class computing. Our core contribution is a consistent, element-wise local attention mechanism. This formulation guarantees that the model's output is arithmetically invariant to the parallel decomposition of the mesh - a critical property for scientific applications by leveraging a halo-exchange mechanism within the attention layer. We demonstrate our approach on a large-scale 3D CFD problem with over 20 million nodes distributed across 96 GPUs.