Spatial Entropy Regularization for Vision Transformers

Elia Peruzzo; Enver Sangineto; Yahui Liu; Marco De Nadai; Wei Bi; Bruno Lepri; Nicu Sebe

ビジョントランスフォーマーの空間エントロピー正則化

最近の研究によると、ビジョントランスフォーマー（VT）の注意マップは、自己監視でトレーニングされた場合、トレーニングが監視されたときに自発的に出現しないセマンティックセグメンテーション構造を含むことができます。この論文では、トレーニングの正則化の形式としてこの空間クラスタリングの出現を明示的に奨励します。この方法では、標準の教師あり学習に自己教師ありの口実タスクが含まれます。より詳細には、情報エントロピーの空間的定式化に基づくVT正則化法を提案します。提案された空間エントロピーを最小化することにより、トレーニング中のオブジェクトベースの事前設定を含む、空間的に順序付けられた注意マップを作成するようにVTに明示的に要求します。広範な実験を使用して、提案された正則化アプローチが、さまざまなトレーニングシナリオ、データセット、ダウンストリームタスク、およびVTアーキテクチャで有益であることを示します。コードは承認時に利用可能になります。

Recent work has shown that the attention maps of Vision Transformers (VTs), when trained with self-supervision, can contain a semantic segmentation structure which does not spontaneously emerge when training is supervised. In this paper, we explicitly encourage the emergence of this spatial clustering as a form of training regularization, this way including a self-supervised pretext task into the standard supervised learning. In more detail, we propose a VT regularization method based on a spatial formulation of the information entropy. By minimizing the proposed spatial entropy, we explicitly ask the VT to produce spatially ordered attention maps, this way including an object-based prior during training. Using extensive experiments, we show that the proposed regularization approach is beneficial with different training scenarios, datasets, downstream tasks and VT architectures. The code will be available upon acceptance.

updated: Thu Jun 09 2022 17:34:39 GMT+0000 (UTC)

published: Thu Jun 09 2022 17:34:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト