PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling

Yuan Liu; Songyang Zhang; Jiacheng Chen; Kai Chen; Dahua Lin

PixMIM: マスクされた画像モデリングにおけるピクセル再構成の再考

Masked Image Modeling (MIM) は、Masked Autoencoders (MAE) と BEiT の出現により、有望な進歩を遂げました。ただし、その後の作業では、フレームワークが新しい補助タスクや追加の事前トレーニング済みモデルで複雑になり、必然的に計算オーバーヘッドが増加しました。この論文では、ピクセル再構成の観点から MIM の基本的な分析を行い、入力画像パッチと再構成ターゲットを調べ、以前は見過ごされていた 2 つの重要なボトルネックを強調しています。この分析に基づいて、非常にシンプルで効果的な方法、PixMIM を提案しますこれには 2 つの戦略が必要です: 1) 再構成ターゲットから高周波成分をフィルタリングして、ネットワークがテクスチャーの豊富な詳細に重点を置いていることを強調しません。2) 保守的なデータ変換戦略を採用して、MIM トレーニングで前景が欠落している問題を軽減します。 PixMIM は、ほとんどの既存のピクセルベースの MIM アプローチ (つまり、生の画像を再構成ターゲットとして使用する) に簡単に統合でき、追加の計算はごくわずかです。付属品がなければ、私たちの方法は、さまざまなダウンストリームタスク全体で、3 つの MIM アプローチ、MAE、ConvMAE、および LSMAE を一貫して改善します。この効果的なプラグアンドプレイ方式は、自己教師あり学習の強力なベースラインとして機能し、MIM フレームワークの将来の改善のための洞察を提供すると考えています。コードは https://github.com/open-mmlab/mmselfsup で入手できます。

Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT. However, subsequent works have complicated the framework with new auxiliary tasks or extra pre-trained models, inevitably increasing computational overhead. This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction, which examines the input image patches and reconstruction target, and highlights two critical but previously overlooked bottlenecks.Based on this analysis, we propose a remarkably simple and effective method, PixMIM, that entails two strategies: 1) filtering the high-frequency components from the reconstruction target to de-emphasize the network's focus on texture-rich details and 2) adopting a conservative data transform strategy to alleviate the problem of missing foreground in MIM training. PixMIM can be easily integrated into most existing pixel-based MIM approaches (i.e., using raw images as reconstruction target) with negligible additional computation. Without bells and whistles, our method consistently improves three MIM approaches, MAE, ConvMAE, and LSMAE, across various downstream tasks. We believe this effective plug-and-play method will serve as a strong baseline for self-supervised learning and provide insights for future improvements of the MIM framework. Code will be available at https://github.com/open-mmlab/mmselfsup.

updated: Sat Mar 04 2023 13:38:51 GMT+0000 (UTC)

published: Sat Mar 04 2023 13:38:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト