Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Siyuan Li; Di Wu; Fang Wu; Zelin Zang; Kai Wang; Lei Shang; Baigui Sun; Hao Li; Stan. Z. Li

アーキテクチャにとらわれないマスクされた画像モデリング-ViTからCNNに戻る

新たに登場した自己監視型の事前トレーニング方法であるマスク画像モデリング（MIM）は、ビジョントランスフォーマー（ViT）を使用した多数のダウンストリームビジョンタスクで目覚ましい成功を収めています。その根底にある考え方は単純です。入力画像の一部がランダムにマスクされ、プレテキストタスクを介して再構築されます。ただし、MIMがうまく機能する理由は十分に説明されておらず、以前の研究では、MIMは主にTransformerファミリで機能しますが、CNNとは互換性がないと主張しています。このホワイトペーパーでは、最初にパッチ間の相互作用を調べて、どのような知識が学習され、MIMタスクを介してどのように取得されるかを理解します。 MIMは基本的に、パッチ間のより良い中間レベルの相互作用を学習し、より一般化された機能を抽出するようにモデルに教えていることを確認します。この事実に基づいて、トランスフォーマーだけでなくCNNとも統一された互換性を持つアーキテクチャにとらわれないマスクされた画像モデリングフレームワーク（A ^ 2MIM）を提案します。人気のあるベンチマークでの広範な実験は、A ^ 2MIMがより良い表現を学習し、トランスフォーマーとCNNの両方のさまざまなダウンストリームタスクに転送するためのより強力な機能をバックボーンモデルに与えることを示しています。

Masked image modeling (MIM), an emerging self-supervised pre-training method, has shown impressive success across numerous downstream vision tasks with Vision transformers (ViT). Its underlying idea is simple: a portion of the input image is randomly masked out and then reconstructed via the pre-text task. However, why MIM works well is not well explained, and previous studies insist that MIM primarily works for the Transformer family but is incompatible with CNNs. In this paper, we first study interactions among patches to understand what knowledge is learned and how it is acquired via the MIM task. We observe that MIM essentially teaches the model to learn better middle-level interactions among patches and extract more generalized features. Based on this fact, we propose an Architecture-Agnostic Masked Image Modeling framework (A^2MIM), which is compatible with not only Transformers but also CNNs in a unified way. Extensive experiments on popular benchmarks show that our A^2MIM learns better representations and endows the backbone model with the stronger capability to transfer to various downstream tasks for both Transformers and CNNs.

updated: Wed Jun 01 2022 13:19:53 GMT+0000 (UTC)

published: Fri May 27 2022 12:42:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト