Yuhang Zang

One Year of Race for Multi-modal Long-Context Understanding: MMLongBench-Doc Leaderboard Updates

Jul 4, 2025 10 min read Author: Yuhang Zang

As Large Language Models (LLMs) are increasingly deployed in real-world scenarios, the ability to understand long-context multimodal content—such as lengthy videos, extensive documents, and complex visual narratives—has become crucial for practical applications. MMLongBench-Doc (NeurIPS 2024 Datasets and Benchmarks Track Spotlight) is a challenging long-context, multi-modal benchmark that evaluates the document understanding ability of Large Vision-Language Models (LVLMs). With documents averaging 47.5 pages and 21,214 textual tokens, MMLongBench-Doc presents a truly demanding test for long-context document understanding capabilities.

Blog Posts

One Year of Race for Multi-modal Long-Context Understanding: MMLongBench-Doc Leaderboard Updates