Skip to yearly menu bar Skip to main content


Competition

CLAS 2024: The Competition for LLM and Agent Safety

Zhen Xiang · Yi Zeng · Mintong Kang · Chejian Xu · Jiawei Zhang · Zhuowen Yuan · Zhaorun Chen · Chulin Xie · Fengqing Jiang · Minzhou Pan · Junyuan Hong · Ruoxi Jia · Radha Poovendran · Bo Li

[ ]
Sun 15 Dec 8:15 a.m. PST — 5:30 p.m. PST

Abstract:

Ensuring safety emerges as a pivotal objective in developing large language models(LLMs) and LLM-powered agents. The Competition for LLM and Agent Safety(CLAS) aims to advance the understanding of the vulnerabilities in LLMs andLLM-powered agents and to encourage methods for improving their safety. Thecompetition features three main tracks linked through the methodology of promptinjection, with tasks designed to amplify societal impact by involving practicaladversarial objectives for different domains. In the Jailbreaking Attack track,participants are challenged to elicit harmful outputs in guardrail LLMs via promptinjection. In the Backdoor Trigger Recovery for Models track, participants aregiven a CodeGen LLM embedded with hundreds of domain-specific backdoors.They are asked to reverse-engineer the trigger for each given target. In the Back-door Trigger Recovery for Agents track, trigger reverse engineering will befocused on eliciting specific backdoor targets based on malicious agent actions. Asthe first competition addressing the safety of both LLMs and LLM agents, CLAS2024 aims to foster collaboration between various communities promoting researchand tools for enhancing the safety of LLMs and real-world AI systems.

Live content is unavailable. Log in and register to view live content