Timezone: »
We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available.
Author Information
Daochen Zha (Rice University)
Louis Feng (Meta Platforms, Inc.)
Qiaoyu Tan (Texas A&M University)
Zirui Liu (Rice University)
Kwei-Herng Lai (Rice University)
Bhargav Bhushanam (Facebook)
Yuandong Tian (Facebook AI Research)
Arun Kejariwal (Meta Platforms Inc.)
Xia Hu (Rice University)
More from the Same Authors
-
2022 : Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization »
Runlong Zhou · Yuandong Tian · YI WU · Simon Du -
2022 : Efficient Planning in a Compact Latent Action Space »
zhengyao Jiang · Tianjun Zhang · Michael Janner · Yueying (Lisa) Li · Tim Rocktäschel · Edward Grefenstette · Yuandong Tian -
2022 Panel: Panel 4B-3: Efficient Methods for… & Understanding Deep Contrastive… »
Zhi-Hua Zhou · Yuandong Tian -
2022 Poster: Understanding Deep Contrastive Learning via Coordinate-wise Optimization »
Yuandong Tian -
2022 Poster: A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking »
Keyu Duan · Zirui Liu · Peihao Wang · Wenqing Zheng · Kaixiong Zhou · Tianlong Chen · Xia Hu · Zhangyang Wang -
2022 Mentorship: New In ML »
Zhen Xu · Mélisande Teng · Jie Fu · Romain Egele · Daochen Zha · Minhao Fan · Eulalie Boucher · Alexandra Volokhova · Isabelle Guyon -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg