Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices. Two computational challenges have limited the applicability of TO to a variety of industrial applications. First, a TO problem often involves a large number of design variables to guarantee sufficient expressive power. Second, many TO problems require a large number of expensive physical model simulations, and those simulations cannot be parallelized. To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO, which utilizes parallel CPU+GPU schemes to accelerate the TO process for designing additively manufactured (AM) materials. Unlike the existing studies of DL for TO, our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient. The surrogate gradient is learned by utilizing parallel computing on multi-CPUs incorporated with distributed DL training on multi-GPUs. The surrogate gradient enables a fast online update scheme instead of an expensive update. Using a local sampling strategy, we achieve to reduce the intrinsic high dimensionality of design space and improve the training accuracy and the scalability of the SDL-TO framework. The method is demonstrated by benchmark examples and AM materials design for heat conduction, and shows competitive performance compared to the baseline methods but significantly reduce the computational cost by a speed up of 8.6x over standard TO implementation.