Timezone: »
We present VAIDA, a novel benchmark creation paradigm (BCP) for NLP. VAIDA provides realtime feedback to crowdworkers about the quality of samples as they are being created, educating them about potential artifacts and allowing them to update samples to remove the same. Concurrently, VAIDA supports backend analysts to review and approve submitted samples for benchmark inclusion, analyze the overall quality of the dataset, and resample splits to obtain and freeze the optimum state. VAIDA is domain, model, task, and metric agnostic, and constitutes a paradigm shift for robust, validated, and dynamic benchmark creation via human-and-metric-in-the-loop workflows. We demonstrate VAIDA's effectiveness by leveraging DQI (a data quality metric) over four datasets. We further evaluate via expert review and a user study with NASA TLX. We find that VAIDA decreases mental demand, temporal demand, effort, and frustration of crowdworkers (29.7%) and analysts(12.1%); it increases performance by 30.8\% and 26\% respectively.
Author Information
Anjana Arunkumar (Arizona State University)
Swaroop Mishra (Arizona State University)
Chitta Baral (Arizona State University)
More from the Same Authors
-
2022 : Benchmarking Counterfactual Reasoning Abilities about Implicit Physical Properties »
Maitreya Patel · Tejas Gokhale · Chitta Baral · 'YZ' Yezhou Yang -
2022 : LILA: A Unified Benchmark for Mathematical Reasoning »
Swaroop Mishra · Matthew Finlayson · Pan Lu · Leonard Tang · Sean Welleck · Chitta Baral · Tanmay Rajpurohit · Oyvind Tafjord · Ashish Sabharwal · Peter Clark · Ashwin Kalyan -
2022 : ContextNER: Contextual Phrase Generation at Scale »
Himanshu Gupta · Shreyas Verma · Tarun Kumar · Swaroop Mishra · Tamanna Agrawal · Amogh Badugu · Himanshu Bhatt -
2022 Workshop: MATH-AI: Toward Human-Level Mathematical Reasoning »
Pan Lu · Swaroop Mishra · Sean Welleck · Yuhuai Wu · Hannaneh Hajishirzi · Percy Liang -
2022 Poster: Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering »
Pan Lu · Swaroop Mishra · Tanglin Xia · Liang Qiu · Kai-Wei Chang · Song-Chun Zhu · Oyvind Tafjord · Peter Clark · Ashwin Kalyan -
2020 Poster: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor -
2020 Spotlight: Language-Conditioned Imitation Learning for Robot Manipulation Tasks »
Simon Stepputtis · Joseph Campbell · Mariano Phielipp · Stefan Lee · Chitta Baral · Heni Ben Amor