DILEMMA: Self-Supervised Shape and Texture Learning with Transformers

Sepehr Sameni; Simon Jenni; Paolo Favaro

DILEMMA：トランスフォーマーを使用した自己監視型の形状とテクスチャの学習

形状はオブジェクトカテゴリのより信頼性の高い指標であるため、形状バイアスのあるディープニューラルネットワークは、テクスチャバイアスのあるモデルよりも優れた一般化機能を示す可能性があるという考えが高まっています。ただし、形状バイアスの既存の測定値は一般化の安定した予測子ではないことを実験的に示し、形状の識別はテクスチャの識別を犠牲にして行われるべきではないと主張します。したがって、自己監視学習を介してトレーニングされたモデルで、形状とテクスチャの両方の識別可能性を明示的に高める疑似タスクを提案します。この目的のために、ViTをトレーニングして、どの入力トークンが誤った位置埋め込みと組み合わされているかを検出します。テクスチャの識別を維持するために、ViTは、MoCoの場合と同様に、学生と教師のアーキテクチャと、追加の学習可能なクラストークンに対する対照的な損失を使用してトレーニングされます。メソッドをDILEMMAと呼びます。これは、MAsked入力を使用した不正な位置の埋め込みの検出を表します。いくつかのデータセットを微調整してメソッドを評価し、MoCoV3およびDINOよりも優れていることを示しています。さらに、ダウンストリームタスクが形状に強く依存している場合（YOGA-82ポーズデータセットなど）、事前にトレーニングされた機能により、以前の作業よりも大幅に向上することを示します。コードは公開時にリリースされます。

There is a growing belief that deep neural networks with a shape bias may exhibit better generalization capabilities than models with a texture bias, because shape is a more reliable indicator of the object category. However, we show experimentally that existing measures of shape bias are not stable predictors of generalization and argue that shape discrimination should not come at the expense of texture discrimination. Thus, we propose a pseudo-task to explicitly boost both shape and texture discriminability in models trained via self-supervised learning. For this purpose, we train a ViT to detect which input token has been combined with an incorrect positional embedding. To retain texture discrimination, the ViT is also trained as in MoCo with a student-teacher architecture and a contrastive loss over an extra learnable class token. We call our method DILEMMA, which stands for Detection of Incorrect Location EMbeddings with MAsked inputs. We evaluate our method through fine-tuning on several datasets and show that it outperforms MoCoV3 and DINO. Moreover, we show that when downstream tasks are strongly reliant on shape (such as in the YOGA-82 pose dataset), our pre-trained features yield a significant gain over prior work. Code will be released upon publication.

updated: Sun Apr 10 2022 22:58:02 GMT+0000 (UTC)

published: Sun Apr 10 2022 22:58:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト