Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Gaze Meets ML

An Attention-based Predictive Agent for Handwritten Numeral/Alphabet Recognition via Generation

Bonny Banerjee · Murchana Baruah

[ ] [ Project Page ]
Sat 16 Dec 12:30 p.m. PST — 12:45 p.m. PST
 
presentation: Gaze Meets ML
Sat 16 Dec 6:15 a.m. PST — 3 p.m. PST

Abstract:

A number of attention-based models for either classification or generation of handwritten numerals/alphabets have been reported in the literature. However, generation and classification are done jointly in very few end-to-end models. We propose a predictive agent model that actively samples its visual environment via a sequence of glimpses. The attention is driven by the agent's sensory prediction (or generation) error. At each sampling instant, the model predicts the observation class and completes the partial sequence observed till that instant. It learns where and what to sample by jointly minimizing the classification and generation errors. Three variants of this model are evaluated for handwriting generation and recognition on images of handwritten numerals and alphabets from benchmark datasets. We show that the proposed model is more efficient in handwritten numeral/alphabet recognition than human participants in a recently published study as well as a highly-cited attention-based reinforcement model. This is the first known attention-based agent to interact with and learn end-to-end from images for recognition via generation, with high degree of accuracy and efficiency.

Chat is not available.