Poster
in
Workshop: Embodied World Models for Decision Making

FLAM: Scaling Latent Action Models with Factorization

Chang Shi · Zizhao Wang · Jiaheng Hu · Roberto Martín-Martín · Peter Stone

Project Page [ OpenReview]

Abstract

Learning from unlabeled video has emerged as a powerful paradigm for training world models without action supervision. However, existing approaches often rely on monolithic inverse and forward dynamics models, which struggle to scale in settings where different entities act simultaneously. In this work, we propose a factored dynamics framework FLAM that decomposes the latent state into in- dependent factors, each with its own inverse and forward model. This structure enables more accurate modeling of complex, multi-entity dynamics and improves prediction quality in action-free video settings. Evaluated on Multigrid, Procgen, nuPlan and Sports datasets, we outperform the monolithic dynamics model and prove the superiority of the factorized framework.

Chat is not available.