Timezone: »
Capsule networks have been shown to be powerful models for image classification, thanks to their ability to represent and capture viewpoint variations of an object. However, the high computational complexity of capsule networks that stems from the recurrent dynamic routing poses a major drawback making their use for large-scale image classification challenging. In this work, we propose Star-Caps a capsule-based network that exploits a straight-through attentive routing to address the drawbacks of capsule networks. By utilizing attention modules augmented by differentiable binary routers, the proposed mechanism estimates the routing coefficients between capsules without recurrence, as opposed to prior related work. Subsequently, the routers utilize straight-through estimators to make binary decisions to either connect or disconnect the route between capsules, allowing stable and faster performance. The experiments conducted on several image classification datasets, including MNIST, SmallNorb, CIFAR-10, CIFAR-100, and ImageNet show that STAR-Caps outperforms the baseline capsule networks.
Author Information
Karim Ahmed (Cornell University/ Dartmouth College)
Lorenzo Torresani (Facebook AI)
Lorenzo Torresani is an Associate Professor with tenure in the Computer Science Department at Dartmouth College and a Research Scientist at Facebook AI. He received a Laurea Degree in Computer Science with summa cum laude honors from the University of Milan (Italy) in 1996, and an M.S. and a Ph.D. in Computer Science from Stanford University in 2001 and 2005, respectively. In the past, he has worked at several industrial research labs including Microsoft Research Cambridge, Like.com and Digital Persona. His research interests are in computer vision and deep learning. He is the recipient of several awards, including a CVPR best student paper prize, a National Science Foundation CAREER Award, a Google Faculty Research Award, three Facebook Faculty Awards, and a Fulbright U.S. Scholar Award.
More from the Same Authors
-
2020 Poster: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2020 Poster: COBE: Contextualized Object Embeddings from Narrated Instructional Video »
Gedas Bertasius · Lorenzo Torresani -
2020 Spotlight: Self-Supervised Learning by Cross-Modal Audio-Video Clustering »
Humam Alwassel · Dhruv Mahajan · Bruno Korbar · Lorenzo Torresani · Bernard Ghanem · Du Tran -
2019 Poster: Learning Temporal Pose Estimation from Sparsely-Labeled Videos »
Gedas Bertasius · Christoph Feichtenhofer · Du Tran · Jianbo Shi · Lorenzo Torresani -
2018 Poster: Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization »
Bruno Korbar · Du Tran · Lorenzo Torresani -
2017 : Poster session (and Coffee Break) »
Jacob Andreas · Kun Li · Conner Vercellino · Thomas Miconi · Wenpeng Zhang · Luca Franceschi · Zheng Xiong · Karim Ahmed · Laurent Itti · Tim Klinger · Mostafa Rohaninejad -
2017 Poster: Learning to Inpaint for Image Compression »
Mohammad Haris Baig · Vladlen Koltun · Lorenzo Torresani -
2016 : ViCom: Benchmark and Methods for Video Comprehension »
Du Tran · Maksim Bolonkin · Manohar Paluri · Lorenzo Torresani -
2016 : Introduction »
Lorenzo Torresani -
2016 Workshop: Large Scale Computer Vision Systems »
Manohar Paluri · Lorenzo Torresani · Gal Chechik · Dario Garcia · Du Tran