François Chollet: Benchmarking Agentic Intelligence
2025 Invited Talk
in
Workshop: LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning
in
Workshop: LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning
Abstract
Large-scale pretraining has plateaued on true generalization. We need a new framework, and new benchmarks, for evaluating how well AI systems can infer rules, explore unfamiliar environments, and acquire new skills efficiently, as humans do. The ARC-AGI family of benchmarks is designed to measure precisely these emerging capabilities.
Video
Chat is not available.
Successful Page Load