Carbon capture and storage (CCS) is one of the most promising technologies for reducing greenhouse gas emissions and relies on numerical reservoir simulations for identifying and monitoring CO2 storage sites. In many commercial settings however, numerical reservoir simulations are too computationally expensive for important downstream application such as optimization or uncertainty quantification. Deep learning-based surrogate models offer the possibility to solve PDEs many orders of magnitudes faster than conventional simulators, but they are difficult to scale to industrial-scale problem settings. Using model-parallel deep learning, we train the largest CO2 surrogate model to date on a 3D simulation grid with two million grid points. To train the 3D simulator, we generate a new training dataset based on a real-world CCS simulation benchmark. Once trained, each simulation with the network is five orders of magnitude faster than a numerical reservoir simulator and 4,500 times cheaper. This paves the way to applications that require thousands of (sequential) simulations, such as optimizing the location of CO2 injection wells to maximize storage capacity and minimize risk of leakage.