Skip to yearly menu bar Skip to main content

Workshop: I Can’t Believe It’s Not Better (ICBINB): Failure Modes in the Age of Foundation Models

Transformer-Based Large Language Models Are Not General Learners: A Universal Circuit Perspective

Yang Chen · Yitao Liang · Zhouchen Lin

Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, evoking perceptions of ``sparks of Artificial General Intelligence (AGI)". A key question naturally arises: *Can foundation models lead to AGI?* In this work, we try to answer this question partially by formally considering the capabilities of Transformer-based LLMs (T-LLMs) from the perspective of universal circuits. By investigating the expressive power of realistic T-LLMs as universal circuits, we show that a T-LLM of size $\operatorname{poly}(n)$ cannot perform all the basic operators of input length $O\left(\operatorname{poly}(\log n)\right)$. We also demonstrate that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM cannot faithfully execute prompts of complexity $n$. Our analysis provides a concrete theoretical foundation that T-LLMs can only be universal circuits for limited function classes. In other words, T-LLMs are not general learners. Furthermore, we exhibit that a constant-depth-$\operatorname{poly}(n)$-size log-precision T-LLM can memorize $O\left(\operatorname{poly}(n)\right)$ instances, which could partially explain the seeming inconsistency between LLMs' empirical successes and our negative results. To the best of our knowledge, our work takes the first step towards analyzing the limitations of T-LLMs as general learners within a rigorous theoretical framework. Our results promote the understanding of LLMs' capabilities and highlight the need for innovative architecture designs beyond Transformers to break current limitations.

Chat is not available.