Timezone: »

Recurrent Models of Visual Attention
Volodymyr Mnih · Nicolas Heess · Alex Graves · koray kavukcuoglu

Thu Dec 11 11:00 AM -- 03:00 PM (PST) @ Level 2, room 210D #None

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Author Information

Volodymyr Mnih (DeepMind)
Nicolas Heess (Google DeepMind)
Alex Graves (Google DeepMind)

Main contributions to neural networks include the Connectionist Temporal Classification training algorithm (widely used for speech, handwriting and gesture recognition, e.g. by Google voice search), a type of differentiable attention for RNNs (originally for handwriting generation, now a standard tool in computer vision, machine translation and elsewhere), stochastic gradient variational inference, and Neural Turing Machines. He works at Google Deep Mind.

koray kavukcuoglu (DeepMind)

More from the Same Authors