firstbacksecondback
47 Results
Poster
|
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining Yuting Gao · Jinfeng Liu · Zihan Xu · Jun Zhang · Ke Li · Rongrong Ji · Chunhua Shen |
||
Poster
|
Tue 14:00 |
GLIPv2: Unifying Localization and Vision-Language Understanding Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao |
|
Poster
|
TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training Yulong Liu · Guibo Zhu · Bin Zhu · Qi Song · Guojing Ge · Haoran Chen · GuanHui Qiao · Ru Peng · Lingxiang Wu · Jinqiao Wang |
||
Poster
|
Thu 14:00 |
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Zi-Yi Dou · Aishwarya Kamath · Zhe Gan · Pengchuan Zhang · Jianfeng Wang · Linjie Li · Zicheng Liu · Ce Liu · Yann LeCun · Nanyun Peng · Jianfeng Gao · Lijuan Wang |
|
Poster
|
Tue 14:00 |
Fine-Grained Semantically Aligned Vision-Language Pre-Training Juncheng Li · XIN HE · Longhui Wei · Long Qian · Linchao Zhu · Lingxi Xie · Yueting Zhuang · Qi Tian · Siliang Tang |
|
Poster
|
Tue 14:00 |
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments Sudipta Paul · Amit Roy-Chowdhury · Anoop Cherian |
|
Workshop
|
Towards Disentangling the Roles of Vision & Language in Aesthetic Experience with Multimodal DNNs Colin Conwell · Christopher Hamblin |
||
Workshop
|
Sat 7:45 |
Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action |
|
Workshop
|
Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action Dhruv Shah |
||
Workshop
|
Probing Representations of Numbers in Vision and Language Models Ivana Kajic · Aida Nematzadeh |
||
Poster
|
Tue 9:00 |
Towards Versatile Embodied Navigation Hanqing Wang · Wei Liang · Luc V Gool · Wenguan Wang |
|
Poster
|
Tue 14:00 |
Revisiting Neural Scaling Laws in Language and Vision Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai |