Publications

2024

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen
ACM Multimedia (ACM MM), 2024 (Open Source Software Competition)
citations Star
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Pan Zhang*, Xiaoyi Dong*, Yuhang Zang*, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
arXiv, 2024
citations Star
WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

Zihao Huang, ShouKang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
arXiv, 2024
citations Star
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
citations
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
citations Star
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Pengyang Ling*, Jiazi Bu*, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin
arXiv, 2024
citations Star
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Lin Chen*, Xilin Wei*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
citations Star
Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
arXiv, 2024
citations Star
Streaming Long Video Understanding with Large Language Models

Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024
citations
Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
European Conference on Computer Vision (ECCV), 2024
citations
Unified Scene Representation and Reconstruction for 3D Large Language Models

Tao Chu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Qiong Liu, Jiaqi Wang
arXiv, 2024
citations
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024
citations Star
Are We on the Right Way for Evaluating Large Vision-Language Models?

Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
Neural Information Processing Systems (NeurIPS), 2024
citations Star
InternLM2 Technical Report

Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, Fukai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, Jia Yu, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin
arXiv, 2024
citations Star
Long-CLIP: Unlocking the Long-Text Capability of CLIP

Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Jiaqi Wang
European Conference on Computer Vision (ECCV), 2024
citations Star
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Ziyu Liu*, Zeyi Sun*, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
arXiv, 2024
citations Star
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
arXiv, 2024
citations Star
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

Yuhang Zang, Hanlin Goh, Josh Susskind, Chen Huang
International Conference on Learning Representations (ICLR), 2024
citations
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
citations Star
Contextual Object Detection with Multimodal Large Language Models

Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy
International Journal of Computer Vision (IJCV), 2024
citations Star

2023

Real-World Object Detection

Yuhang Zang
PhD Thesis, Nanyang Technological University, 2023
citations
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch

Yuhang Zang, Kaiyang Zhou, Chen Huang, Chen Change Loy
International Journal of Computer Vision (IJCV), 2023
citations

2022

Unified Vision and Language Prompt Learning

Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
arXiv 2022
citations Star
On-Device Domain Generalization

Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu
arXiv 2022
citations Star
Open-Vocabulary DETR with Conditional Matching

Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
European Conference on Computer Vision (ECCV), 2022 (Oral)
citations Star

2021

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
Yuhang Zang, Chen Huang, Chen Change Loy
IEEE International Conference on Computer Vision (ICCV), 2021
citations Star
Seesaw Loss for Long-Tailed Instance Segmentation
Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
citations Star

2020

1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation
Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
arXiv, 2020
citations Star
KPNet: Towards Minimal Face Detector
Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
AAAI Conference on Artificial Intelligence (AAAI), 2020
citations

2019

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
IEEE International Conference on Computer Vision (ICCV), 2019
citations Star
Scene Text Detection with Supervised Pyramid Context Network
Enze Xie*, Yuhang Zang*, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI Conference on Artificial Intelligence (AAAI), 2019
citations