arXiv reaDer
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Recent vision-language models have achieved tremendous progress far beyond what we ever expected. However, their computational costs are also dramatically growing with rapid development, especially for the large models. It makes model acceleration exceedingly critical in a scenario of limited resources. Although extensively studied for unimodal models, the acceleration for multimodal models, especially the vision-language Transformers, is relatively under-explored. To pursue more efficient and accessible vision-language Transformers, this paper introduces Cross-Guided Ensemble of Tokens (CrossGET), a universal acceleration framework for vision-language Transformers. This framework adaptively combines tokens through real-time, cross-modal guidance, thereby achieving substantial acceleration while keeping high performance. CrossGET has two key innovations: 1) Cross-Guided Matching and Ensemble. CrossGET incorporates cross-modal guided token matching and ensemble to exploit cross-modal information effectively, only introducing cross-modal tokens with negligible extra parameters. 2) Complete-Graph Soft Matching. In contrast to the existing bipartite soft matching approach, CrossGET introduces a complete-graph soft matching policy to achieve more reliable token-matching results while maintaining parallelizability and high efficiency. Extensive experiments are conducted on various vision-language tasks, including image-text retrieval, visual reasoning, image captioning, and visual question answering. Performance on both classic multimodal architectures and emerging multimodal LLMs demonstrate the effectiveness and versatility of the proposed CrossGET framework. The code will be at https://github.com/sdc17/CrossGET.
updated: Wed Oct 04 2023 22:11:50 GMT+0000 (UTC)
published: Sat May 27 2023 12:07:21 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト