X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Abstract
Multi-turn-to-single-turn (M2S) compresses iterative red teaming into one structured prompt, but prior work relied on a few hand-crafted formats. We present X‑Teaming Evolutionary M2S, an automated framework that discovers and optimizes M2S templates via LLM‑guided evolution, with smart sampling (12 sources), a StrongREJECT‑style LLM‑as‑judge, and auditable logs.To restore selection pressure, we calibrate the success threshold to θ=0.70. On GPT‑4.1 this yields five generations, two new template families, and 44.8% overall success (103/230). A balanced cross‑model panel (2,500 trials; judge fixed) shows that structural gains transfer but vary by target; two models score zero at θ=0.70. We also observe a positive length–score coupling, motivating length‑aware judging.Our results establish structure‑level search as a reproducible path to stronger single‑turn probes and highlight threshold calibration and cross‑model evaluation as key to progress. Code, configs, and artifacts: https://github.com/hyunjun1121/M2S-x-teaming.