ADCA: Artifact-Based Dataset Creativity Assessment
Harrison Sims · Gabriel Ganberg · Robert McCormack · Svitlana Volkova
Abstract
We present a three-dimensional framework for automated dataset creativity assessment that decomposes creativity into measurable components: attribute novelty, recombination novelty, and feature addition. Our method treats data points as collections of categorized artifacts, evaluating universal and unique attributes through semantic embedding comparison. Attribute novelty measures semantic diversity using pairwise cosine similarity of CLIP embeddings, recombination novelty quantifies unique attribute co-occurrences via hierarchical clustering, and feature addition assesses unique embellishment distribution. Validation on three 100-image datasets shows significant statistical differences across metrics ($p < .001$), with forced creativity datasets achieving the highest attribute novelty and prompted creativity datasets demonstrating superior recombination patterns. Strong positive correlations between metrics ($r = 0.43-0.55$) support construct validity. This modality-agnostic, embedding-based evaluation framework enables systematic assessment of generated image data quality beyond traditional performance benchmarks, with direct implications for foundation model training in high-stakes applications. The three 100-image datasets are publicly available.
Chat is not available.
Successful Page Load