Poster
|
Thu 16:30
|
Approximation Rate of the Transformer Architecture for Sequence Modeling
Haotian Jiang · Qianxiao Li
|
|
Affinity Event
|
|
Reducing Reasoning Costs - The Path of Optimization for Chain of Thought via Sparse Attention Mechanism
Libo Wang
|
|
Poster
|
Wed 16:30
|
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
Satwik Bhattamishra · Michael Hahn · Phil Blunsom · Varun Kanade
|
|
Poster
|
|
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
JINGFENG YAO · Cheng Wang · Wenyu Liu · Xinggang Wang
|
|
Poster
|
Wed 16:30
|
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
Sy-Tuyen Ho · Tuan Van Vo · Somayeh Ebrahimkhani · Ngai-Man (Man) Cheung
|
|
Poster
|
|
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
Wei Dong · Yuan Sun · Yiting Yang · Xing Zhang · Zhijun Lin · Qingsen Yan · Haokui Zhang · Peng Wang · Yang Yang · Hengtao Shen
|
|
Poster
|
Wed 16:30
|
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers
Xin Lu · Yanyan Zhao · Bing Qin · Liangyu Huo · Qing Yang · Dongliang Xu
|
|
Workshop
|
|
Towards Object-Centric Learning with General Purpose Architectures
Jack Brady · Julius von Kügelgen · Sébastien Lachapelle · Simon Buchholz · Thomas Kipf · Wieland Brendel
|
|