HateXplain Space Model: Fusing Robustness with Explainability in Hate Speech Analysis
Md Fahim · Md Shihab Shahriar · Mohammad Ruhul Amin
Abstract
In the realm of Natural Language Processing, Language Models (LMs) excel in various tasks but face challenges in identifying hate contexts while considering zero-shot or transfer learning issues. To address this, we introduce Space Modeling (SM), a novel approach that enhances hate context detection by generating word-level attribution and bias scores. These scores provide intuitive insights into model predictions and aid in the recognition of hateful terms. Our experiments across six hatespeech datasets reveal SM's superiority over existing methods, marking a significant advancement in refining LM-based hate context detection.
Chat is not available.
Successful Page Load