Timezone: »

Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang · Linyi Yang · Zhiyang Teng · Ming Zhou · Yue Zhang

Transformer-based models have gained much advance in recent years, becoming one of the most important backbones in natural language processing. Recent work shows that the attention mechanism in Transformer may not be necessary, both convolutional neural networks and multi-layer perceptron based models have been investigated as Transformer alternatives. In this paper, we consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications, together with a sentence-level representation decoupled from other tokens. We find such architecture can give comparable results against Transformer-based ones in both English and Chinese language benchmarks. Moreover, instead of the quadratic complexity, our model has linear complexity and performs more efficiently during inference. Our models and code will be released for further research.

Author Information

Yile Wang (Tsinghua University)
Linyi Yang (Westlake University)
Zhiyang Teng (Westlake University)
Ming Zhou (Langboat Ltd.)

Founder and CEO of Langboat company, and Chief Scientist of Sino Venture company. Prior to it, i was Assistant managing director of Microsoft Research Asia. I joined MSRA in 1999 and worked with it until 2020.

Yue Zhang (Westlake University)

More from the Same Authors

  • 2022 Poster: USB: A Unified Semi-supervised Learning Benchmark for Classification »
    Yidong Wang · Hao Chen · Yue Fan · Wang SUN · Ran Tao · Wenxin Hou · Renjie Wang · Linyi Yang · Zhi Zhou · Lan-Zhe Guo · Heli Qi · Zhen Wu · Yu-Feng Li · Satoshi Nakamura · Wei Ye · Marios Savvides · Bhiksha Raj · Takahiro Shinozaki · Bernt Schiele · Jindong Wang · Xing Xie · Yue Zhang