Virtual Cells as Causal World Models: A Perspective on Evaluation
Abstract
Evaluating virtual cells requires moving beyond predictive accuracy to assessing their ability to serve as causal world models of biology. Current benchmarks emphasize fit to observed data, rewarding pattern matching but rarely testing responses to interventions. We argue that building causal virtual cells demands a new evaluation paradigm based on metrics and benchmarks that assess intervention validity, counterfactual consistency, trajectory faithfulness, and mechanistic alignment. Our contribution is twofold: (1) a survey of recent approaches to virtual cell modeling, and (2) a taxonomy of causal evaluation metrics mapped to available perturbation datasets. By identifying gaps and proposing unified causal benchmarks, we position causal evaluation as the critical step toward making virtual cells reliable world models of biology.