Skip to yearly menu bar Skip to main content


Poster

ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty

Xindi Wu · Dingli Yu · Yangsibo Huang · Olga Russakovsky · Sanjeev Arora


Abstract: Compositionality is a critical capability in Text-to-Image (T2I) models, as it reflects their ability to understand and combine multiple concepts from text descriptions. Existing evaluations of compositional capability rely heavily on human-designed text prompts or fixed templates, limiting their diversity and complexity, and so the evaluations have low discriminative power. We propose ConceptMix, a scalable, controllable, and customizable benchmark consisting of two stages: (a) With categories of visual concepts (e.g., objects, colors, shapes, spatial relationships), it randomly samples an object and $k$-tuples of visual concepts to generate text prompts with GPT-4o for image generation. (b) To automatically evaluate generation quality, ConceptMix uses an LLM to generate one question per visual concept, allowing automatic grading of whether each specified concept appears correctly in the generated images. By testing a diverse set of T2I models using increasing values of $k$, we show that our ConceptMix has higher discrimination power than earlier benchmarks. ConceptMix reveals, unlike previous benchmarks, the performance of several models drops dramatically with increased $k$. ConceptMix is easily extendable to more visual concept categories and gives insight into lack of prompt diversity in datasets such as LAION-5B, guiding future T2I model development.

Live content is unavailable. Log in and register to view live content