Skip to yearly menu bar Skip to main content


Poster

How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers

Xin Lu · Yanyan Zhao · Bing Qin · Liangyu Huo · Qing Yang · Dongliang Xu
2024 Poster
[ Paper [ Poster [ OpenReview

Abstract

Video

Chat is not available.