Timezone: »
Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers
Alexander Wong · Mohammad Javad Shafiee · Saad Abbasi · Saeejith Nair · Mahmoud Famouri
With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for more efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a new faster attention condenser design called double-condensing attention condensers that enable more condensed feature embedding. We further employ a machine-driven design exploration strategy that imposes best practices design constraints for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name \textbf{AttendNeXt}) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones ($>10\times$ faster than FB-Net C at higher accuracy and speed and $>10\times$ faster than MobileOne-S1 at smaller size) while having a small model size ($>1.37\times$ smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1\% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.
Author Information
Alexander Wong (University of Waterloo)
Mohammad Javad Shafiee (University of Waterloo)
Saad Abbasi (University of Waterloo)
Saeejith Nair (University of Waterloo)
Mahmoud Famouri (DarwinAI)
More from the Same Authors
-
2020 : COVIDNet-S: SARS-CoV-2 lung disease severity grading of chest X-rays using deep convolutional neural networks »
Alexander Wong -
2021 : COVID-Net Clinical ICU: Enhanced Prediction of ICU Admission for COVID-19 Patients via Explainability and Trust Quantification »
Audrey Chung · Mahmoud Famouri · Andrew Hryniowski · Alexander Wong -
2021 : Graph Convolutional Networks for Multi-modality Movie Scene Segmentation »
Yaoxin Li · Alexander Wong · Mohammad Javad Shafiee -
2021 : MAPLE: Microprocessor A Priori for Latency Estimation »
Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee -
2022 : A Fair Loss Function for Network Pruning »
Robbie Meyer · Alexander Wong -
2022 : COVID-Net Biochem: An Explainability-driven Framework to Building Machine Learning Models for Predicting Survival and Kidney Injury of COVID-19 Patients from Clinical and Biochemistry Data »
Hossein Aboutalebi · Maya Pavlova · Mohammad Javad Shafiee · Adrian Florea · Andrew Hryniowski · Alexander Wong -
2022 : COVIDx CT-3: A Large-scale, Multinational, Open-Source Benchmark Dataset for Computer-aided COVID-19 Screening from Chest CT Images »
Hayden Gunraj · Tia Tuinstra · Alexander Wong -
2022 : COVIDx CXR-3: A Large-Scale, Open-Source Benchmark Dataset of Chest X-ray Images for Computer-Aided COVID-19 Diagnostics »
Maya Pavlova · Tia Tuinstra · Hossein Aboutalebi · Andy Zhao · Hayden Gunraj · Alexander Wong -
2021 : Live Q&A session: MAPLE: Microprocessor A Priori for Latency Estimation »
Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee -
2021 : Contributed Talk (Oral): MAPLE: Microprocessor A Priori for Latency Estimation »
Saad Abbasi · Alexander Wong · Mohammad Javad Shafiee -
2020 : Lightning Talk 1: Insights into Fairness through Trust: Multi-scale Trust Quantification for Financial Deep Learning »
Alexander Wong · Andrew Hryniowski · Xiao Yu Wang -
2018 : Poster presentations »
Simon Wiedemann · Huan Wang · Ivan Zhang · Chong Wang · Mohammad Javad Shafiee · Rachel Manzelli · Wenbing Huang · Tassilo Klein · Lifu Zhang · Ashutosh Adhikari · Faisal Qureshi · Giuseppe Castiglione