Bridging the Gap Between AI Quantization and Edge Deployment: INT4 and INT8 on the Edge
Mohammad Köse · Qazi Arbab Ahmed · Thorsten Jungeblut
Abstract
Quantization is the key to deploying neural networks on microcontroller-class edge devices. While INT4 and mixed-precision schemes promise strong compression–accuracy trade-offs in simulation, current toolchains only support INT8 in practice. We benchmark FP32, INT8, INT4, and mixed-precision on Tiny YOLOv2 and deploy INT8 models on STM32N6, exposing this research–deployment gap. To address it, we propose a heterogeneous sub-INT8 strategy that combines INT8 acceleration with selective INT4 fallback execution, enabling practical hybrid deployment on today’s edge hardware
Successful Page Load