Skip to yearly menu bar Skip to main content

Workshop: Workshop on robustness of zero/few-shot learning in foundation models (R0-FoMo)

Neural Sandbox Framework for Classification: A Concept Based Method of Leveraging LLMs for Text Classification

Mostafa Mushsharat · Nabeel Mohammed · Mohammad Ruhul Amin


We introduce a neural sandbox framework for text classification via self-referencing defined label concepts from an Large Language Model(LLM). The framework draws inspiration from the define-optimize alignment problem, in which the motivations of a model are described initially and then the model is optimized to align with these predefined objectives. In our case, we focus on text classification where we use a pre-trained LLM to convert text into vectors and provide it with specific concept words based on labels and input text. We then optimize an operator to classify text based on how relevant it is to these concept words (cop-words). Experiments with multiple text classification datasets and LLM models reveal that incorporating our sandbox network generally improves the accuracy and macro f1 when compared to a baseline. The framework, not only improves classification but also provides insights into the model's decision making based on the provided cop-words. We also demonstrated the framework's ability to understand learned concepts and identify potential biases. However, we found that the model's incentives may not always align with human decisions.

Chat is not available.