Skip to yearly menu bar Skip to main content


DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments

Chiyu Zhang ⋅ Marc-Alexandre Côté ⋅ Michael Albada ⋅ Anush Sankaran ⋅ Jack Stokes ⋅ Tong Wang ⋅ Amir Abdi ⋅ William Blum ⋅ Muhammad Abdul-Mageed

Abstract

Chat is not available.