Timezone: »
Secure multi-party computation (MPC) is gaining popularity with the growing demand for privacy-preserving cloud services. While there has been plenty of attention to MPCs for convolution neural networks (CNNs), MPC-based private inference for Transformer models has not been studied in detail. This paper provides a characterization study of the performance overhead for running Transformer models with secure MPC, and proposes an optimization for embedding tables. Our study shows that Transformers introduce a couple of new challenges for MPC-based private inference: softmax and embedded tables. To address the overhead of embedding table accesses under MPC, we propose to use tensor-train (TT) decomposition, a mechanism that splits a large embedding tables into multiple smaller embedding tables. For the NLP workloads, the experiments show that the TT decomposition can speed up embedding table accesses by 2x with only a 1.19 drop in the masked-language model perplexity score.
Author Information
Yongqin Wang (University of Southern California)
Brian Knott (Facebook)
Murali Annavaram (University of Southern California)
Hsien-Hsin Lee (Facebook)
More from the Same Authors
-
2021 Poster: CrypTen: Secure Multi-Party Computation Meets Machine Learning »
Brian Knott · Shobha Venkataraman · Awni Hannun · Shubho Sengupta · Mark Ibrahim · Laurens van der Maaten -
2020 Poster: Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge »
Chaoyang He · Murali Annavaram · Salman Avestimehr -
2018 Poster: GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training »
Mingchao Yu · Zhifeng Lin · Krishna Narra · Songze Li · Youjie Li · Nam Sung Kim · Alex Schwing · Murali Annavaram · Salman Avestimehr