François Chollet: Benchmarking Agentic Intelligence
Abstract
Large-scale pretraining has plateaued on true generalization. We need a new framework, and new benchmarks, for evaluating how well AI systems can infer rules, explore unfamiliar environments, and acquire new skills efficiently, as humans do. The ARC-AGI family of benchmarks is designed to measure precisely these emerging capabilities.
Video
Chat is not available.
Successful Page Load