Automatic Data Quality Evaluation for Text Classification
2021 Lightning Talk
in
Workshop: Data Centric AI
in
Workshop: Data Centric AI
Abstract
Data quality is critical for machine learning, but its evaluation usually relies on the performance of used models. A model-independent data quality evaluation metric is needed. This paper proposes a convenient metric called DQTC to quantify the data quality for text classification based on information theory. And an experiment is conducted to verify the relevance between DQTC and model performance. Finally, we describe the linguistic improvement that should be considered. The code is available online.
Video
Chat is not available.
Successful Page Load