Timezone: »

Characterizing and Improving MPC-based Private Inference for Transformer-based Models
Yongqin Wang · Brian Knott · Murali Annavaram · Hsien-Hsin Lee
Event URL: https://openreview.net/forum?id=81IVQXVoi-Y »

Secure multi-party computation (MPC) is gaining popularity with the growing demand for privacy-preserving cloud services. While there has been plenty of attention to MPCs for convolution neural networks (CNNs), MPC-based private inference for Transformer models has not been studied in detail. This paper provides a characterization study of the performance overhead for running Transformer models with secure MPC, and proposes an optimization for embedding tables. Our study shows that Transformers introduce a couple of new challenges for MPC-based private inference: softmax and embedded tables. To address the overhead of embedding table accesses under MPC, we propose to use tensor-train (TT) decomposition, a mechanism that splits a large embedding tables into multiple smaller embedding tables. For the NLP workloads, the experiments show that the TT decomposition can speed up embedding table accesses by 2x with only a 1.19 drop in the masked-language model perplexity score.

Author Information

Yongqin Wang (University of Southern California)
Brian Knott (Facebook)
Murali Annavaram (University of Southern California)
Hsien-Hsin Lee (Facebook)

More from the Same Authors