Skip to yearly menu bar Skip to main content


Search All 2023 Events
 

4 Results

<<   <   Page 1 of 1   >>   >
Workshop
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan
Workshop
Sat 8:25 Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan
Workshop
Skill-Mix: A Flexible and Expandable Family of Evaluations for AI Models
Dingli Yu · Simran Kaur · Arushi Gupta · Jonah Brown-Cohen · Anirudh Goyal · Sanjeev Arora
Workshop
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye · Doyoung Kim · Sungdong Kim · Hyeonbin Hwang · Seungone Kim · Yongrae Jo · James Thorne · Juho Kim · Minjoon Seo