Timezone: »
Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation, which is not naturally suitable for image matching. Accordingly, we propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity computation. Additionally, global max pooling and a multilayer perceptron (MLP) head are applied to decode the matching result. This way, the simplified decoder is computationally more efficient, while at the same time more effective for image matching. The proposed method, called TransMatcher, achieves state-of-the-art performance in generalizable person re-identification, with up to 6.1% and 5.7% performance gains in Rank-1 and mAP, respectively, on several popular datasets. Code is available at https://github.com/ShengcaiLiao/QAConv.
Author Information
Shengcai Liao (Inception Institute of Artificial Intelligence (IIAI))
Shengcai Liao is a Lead Scientist in the Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE. He is a Senior Member of IEEE. Previously, he was an Associate Professor in the Institute of Automation, Chinese Academy of Sciences (CASIA). He received the B.S. degree in mathematics from the Sun Yat-sen University in 2005 and the Ph.D. degree from CASIA in 2010. He was a Postdoc in the Michigan State University during 2010-2012. His research interests include object detection, recognition, and tracking, especially face and person related tasks. He has published over 100 papers, with **over 14,900 citations and h-index 43** according to Google Scholar. He **ranks 905 among 215,114 scientists (Top 0.42%)** in 2019 single year in the field of AI, according to a study by Stanford University of Top 2% world-wide scientists. His representative work LOMO+XQDA, known for effective feature design and metric learning for person re-identification, has been **cited over 1,900 times and ranks top 10 among 602 papers in CVPR 2015**. He was awarded the Best Student Paper in ICB 2006, ICB 2015, and CCBR 2016, and the Best Paper in ICB 2007. He was also awarded the IJCB 2014 Best Reviewer and CVPR 2019/2021 Outstanding Reviewer. He was an Assistant Editor for the book “Encyclopedia of Biometrics (2nd Ed.)”. He will serve as Program Chair for IJCB 2022, and Area Chair for CVPR 2022 and ECCV 2022. He served as Area Chairs for ICPR 2016, ICB 2016 and 2018, SPC for IJCAI 2021, and reviewers for ICCV, CVPR, ECCV, NeurIPS, ICLR, AAAI, TPAMI, IJCV, TNNLS, etc. He was the Winner of the CVPR 2017 Detection in Crowded Scenes Challenge and ICCV 2019 NightOwls Pedestrian Detection Challenge.
Ling Shao (Inception Institute of Artificial Intelligence)
More from the Same Authors
-
2023 Poster: ProtoDiff: Learning to Learn Prototypical Networks by Task-Guided Diffusion »
Yingjun Du · Zehao Xiao · Shengcai Liao · Cees Snoek -
2023 Poster: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation »
Yun Xing · Jian Kang · Aoran Xiao · Jiahao Nie · Ling Shao · Shijian Lu -
2022 Spotlight: Masked Generative Adversarial Networks are Data-Efficient Generation Learners »
Jiaxing Huang · Kaiwen Cui · Dayan Guan · Aoran Xiao · Fangneng Zhan · Shijian Lu · Shengcai Liao · Eric Xing -
2022 Poster: PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds »
Aoran Xiao · Jiaxing Huang · Dayan Guan · Kaiwen Cui · Shijian Lu · Ling Shao -
2022 Poster: Masked Generative Adversarial Networks are Data-Efficient Generation Learners »
Jiaxing Huang · Kaiwen Cui · Dayan Guan · Aoran Xiao · Fangneng Zhan · Shijian Lu · Shengcai Liao · Eric Xing -
2021 Poster: You Never Cluster Alone »
Yuming Shen · Ziyi Shen · Menghan Wang · Jie Qin · Philip Torr · Ling Shao -
2021 Poster: Variational Multi-Task Learning with Gumbel-Softmax Priors »
Jiayi Shen · Xiantong Zhen · Marcel Worring · Ling Shao -
2021 Poster: HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning »
Shiming Chen · Guosen Xie · Yang Liu · Qinmu Peng · Baigui Sun · Hao Li · Xinge You · Ling Shao -
2020 Poster: Learning to Learn Variational Semantic Memory »
Xiantong Zhen · Yingjun Du · Huan Xiong · Qiang Qiu · Cees Snoek · Ling Shao -
2020 Poster: Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency »
Fang Zhao · Shengcai Liao · Kaihao Zhang · Ling Shao -
2019 Poster: Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test »
Lizhong Ding · Mengyang Yu · Li Liu · Fan Zhu · Yong Liu · Yu Li · Ling Shao