Invited Talk
in
Workshop: LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning

François Chollet: Benchmarking Agentic Intelligence

2025 Invited Talk
in
Workshop: LAW 2025: Bridging Language, Agent, and World Models for Reasoning and Planning

Project Page

Abstract

Large-scale pretraining has plateaued on true generalization. We need a new framework, and new benchmarks, for evaluating how well AI systems can infer rules, explore unfamiliar environments, and acquire new skills efficiently, as humans do. The ARC-AGI family of benchmarks is designed to measure precisely these emerging capabilities.

Video

Chat is not available.