RAISE: Reliable Agent Improvement via Simulated Experience
Abstract
AI agents hold great value for enterprises but are notoriously difficult to productionize in a robust and reliable way. Enterprise agents must internalize task context, constraints, and success metrics to be reliable, yet learning directly in production is risky and often infeasible. We present RAISE, a simulation-first experiential learning framework for training and evaluating domain-specific AI agents through simulated experience. RAISE constructs high-fidelity interactive environments that mirror target deployments, including tool APIs, data schemas, user behaviors, and organizational policies. The system generates executable tool-calling and user-simulation traces and logs replayable trajectories. A hybrid evaluation layer provides dense, verifiable signals from task-specific checkers and rubric-driven LLM-as-a-judge assessments. The framework is optimizer-agnostic and supports multiple post-training paths, including supervised fine-tuning on simulated transcripts, reinforcement learning with online rollouts and trajectory replay, and iterative prompt or policy optimization.