Bridging_Reading_Accessibility_Gaps__Responsible_Multimodal_Simplification_with_Generative_AI
Abstract
Complex, multimodal content remains a barrier to accessibility in education, health-care, and technical domains. We present a responsible multimodal simplification system that jointly simplifies text and images while preserving context. The pipeline integrates Age-of-Acquisition (AoA) guidance and word-sense disambiguation with a graph-based retrieval-augmented generation (RAG) module that fetches domain-specific definitions from curated knowledge bases; an image captioner produces level-aware captions for diagrams and schematics. A real-time feedback loop allows users to refine outputs, adapt terminology, and steer retrieval. Across 14,000 items spanning educational, medical, and technical sources, the system improves readability over a strong LLM baseline (GPT-4): +22.21% SARI and +14.11% Flesch Reading Ease. These gains prioritize accessibility over exact form preservation, as reflected by BLEU and Cosine Similarity. Graph-based RAG increases domain-term retrieval precision by 11%. In teacher-facilitated classroom use with 200 K–12 students (ages grouped 5/7/9/11), and in additional evaluations with medical and technical professionals, users reported that the system’s outputs were easier to understand and more useful for non-experts. Incorporating user feedback yielded a further 8% improvement in SARI and a 15% increase in user satisfaction. By coupling multimodal processing with knowledge-grounded retrieval and human-in-the-loop adaptation, this work advances practical accessibility for high-impact domains while aligning with responsible deployment principles