VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen
ACM Multimedia (ACM MM), 2024 (Open Source Software Competition)
Star
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang*, Xiaoyi Dong*, Yuhang Zang*, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
arXiv, 2024
Star
WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation
Zihao Huang, ShouKang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
arXiv, 2024
Star
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
Star
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling*, Jiazi Bu*, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin
arXiv, 2024
Star
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen*, Xilin Wei*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024 (Datasets and Benchmarks Track)
Star
Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
arXiv, 2024
Star
Streaming Long Video Understanding with Large Language Models
Rui Qian, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Shuangrui Ding, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024
Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
European Conference on Computer Vision (ECCV), 2024
Unified Scene Representation and Reconstruction for 3D Large Language Models
Tao Chu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Qiong Liu, Jiaqi Wang
arXiv, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
Neural Information Processing Systems (NeurIPS), 2024
Star
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen*, Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
Neural Information Processing Systems (NeurIPS), 2024
Star
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Jiaqi Wang
European Conference on Computer Vision (ECCV), 2024
Star
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Ziyu Liu*, Zeyi Sun*, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
arXiv, 2024
Star
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiaoyi Dong*, Pan Zhang*, Yuhang Zang*, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
arXiv, 2024
Star
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang, Hanlin Goh, Josh Susskind, Chen Huang
International Conference on Learning Representations (ICLR), 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Star
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy
International Journal of Computer Vision (IJCV), 2024
Star
Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change Loy, Ziwei Liu
arXiv 2022
Star
Open-Vocabulary DETR with Conditional Matching
Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy
European Conference on Computer Vision (ECCV), 2022 (Oral)
Star
2021
FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
Yuhang Zang, Chen Huang, Chen Change Loy
IEEE International Conference on Computer Vision (ICCV), 2021
Star
Seesaw Loss for Long-Tailed Instance Segmentation
Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Star
2020
1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation
Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
arXiv, 2020
Star
KPNet: Towards Minimal Face Detector
Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
AAAI Conference on Artificial Intelligence (AAAI), 2020
2019
Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network
Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
IEEE International Conference on Computer Vision (ICCV), 2019
Star
Scene Text Detection with Supervised Pyramid Context Network
Enze Xie*, Yuhang Zang*, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
AAAI Conference on Artificial Intelligence (AAAI), 2019