How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Zhe Chen; Weiyun Wang; Hao Tian; Shenglong Ye; Zhangwei Gao; Erfei Cui; Wenwen Tong; Kongzhi Hu; Jiapeng Luo; Zheng Ma; Ji Ma; Jiaqi Wang; Xiaoyi Dong; Hang Yan; Hewei Guo; Conghui He; Botian Shi; Zhenjiang Jin; Chao Xu; Bin Wang; Xingjian Wei; Wei Li; Wenjian Zhang; Bo Zhang; Pinlong Cai; Licheng Wen; Xiangchao Yan; Min Dou; Lewei Lu; Xizhou Zhu; Tong Lu; Dahua Lin; Yu Qiao; Jifeng Dai; Wenhai Wang

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. (2) Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. (3) High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code has been released at https://github.com/OpenGVLab/InternVL.

updated: Mon Apr 29 2024 20:24:30 GMT+0000 (UTC)

published: Thu Apr 25 2024 17:59:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト