This paper develops an approach to assess reinforcement learners with collusive pricing policies in a testing environment. We find that algorithms are unable to extrapolate collusive policies from their training environment to testing environments. Collusion consistently breaks down, and algorithms instead tend to converge to Nash prices. Policy updating with or without exploration re-establishes collusion, but only in the current environment. This is robust to repeated learning across environments. Our results indicate that frequent market interaction, coordination of algorithm design, and stable environments are essential for algorithmic collusion.