Poster
in
Affinity Event: Muslims in ML

Bridging the Gap Between AI Quantization and Edge Deployment: INT4 and INT8 on the Edge

Mohammad Köse · Qazi Arbab Ahmed · Thorsten Jungeblut

Project Page [ OpenReview]

Abstract

Quantization is the key to deploying neural networks on microcontroller-class edge devices. While INT4 and mixed-precision schemes promise strong compression–accuracy trade-offs in simulation, current toolchains only support INT8 in practice. We benchmark FP32, INT8, INT4, and mixed-precision on Tiny YOLOv2 and deploy INT8 models on STM32N6, exposing this research–deployment gap. To address it, we propose a heterogeneous sub-INT8 strategy that combines INT8 acceleration with selective INT4 fallback execution, enabling practical hybrid deployment on today’s edge hardware

Chat is not available.