Affinity Workshop: Women in Machine Learning

The Role of Expert-driven Prompt Engineering for Fine-grained Zero-shot Classification in Fashion

Dhanashree Balaram · Matthew Nokleby · Thiyagarajan Ramanathan · Ajitesh Gupta Gupta · Ravi Kannan


E-commerce fashion retailers face the increasingly challenging task of tracking fast-changing trends in delivering content-specific recommendations. A key differentiator among competitors in this space is the ability to identify fine-grained fashion attributes that distinguish these trends. Such fine-grained attributes go deeper than, e.g., the colors, fabrics, and sizes of clothing that are easily captured in retailers' catalogs, and involve nuanced attributes, such as the specific sleeve style of a garment, its intended occasion, or whether it belongs to a trending style. In order to classify these attributes via standard methods, we require large amounts of fine-grained, accurately labeled data which is expensive and labor intensive. We use CLIP to carry out zero-shot classification for fine-grained fashion attributes, where text prompts represent the classes. We demonstrate that for fine-grained fashion attributes, expert-driven prompts deliver higher accuracy than coarse, naive prompts. Because CLIP depends on joint text-image similarity, we hypothesize that adding detailed fashion descriptions like the captions included in the CLIP training set, lead to more well-defined embedding spaces and higher classification accuracy.

Chat is not available.