Constrained Mean Shift for Representation Learning

Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed Pirsiavash

表現学習のための制約付き平均シフト

ラベル付きまたはラベルなしのデータからの表現学習に関心があります。自己監視学習（SSL）の最近の成功に触発されて、追加の知識を活用できる非対照的な表現学習方法を開発します。この追加の知識は、監視対象設定の注釈付きラベル、またはSSL設定の別のモダリティのSSLモデルから得られる場合があります。私たちの主なアイデアは、最近傍の探索空間を制約することによって平均シフトアルゴリズムを一般化し、意味的に純粋な表現をもたらすことです。私たちの方法は、追加の知識を使用して制約されている検索スペースで、インスタンスの埋め込みを最近傍に引き寄せるだけです。この非対照的な損失を活用することにより、私たちの方法で教師ありImageNet-1kの事前トレーニングを行うと、ベースラインと比較して転送パフォーマンスが向上することを示します。さらに、我々の方法がラベル付けノイズに対して比較的ロバストであることを示します。最後に、モダリティ全体でノイズの多い制約を使用して、自己監視ビデオモデルをトレーニングできることを示します。

We are interested in representation learning from labeled or unlabeled data. Inspired by recent success of self-supervised learning (SSL), we develop a non-contrastive representation learning method that can exploit additional knowledge. This additional knowledge may come from annotated labels in the supervised setting or an SSL model from another modality in the SSL setting. Our main idea is to generalize the mean-shift algorithm by constraining the search space of nearest neighbors, resulting in semantically purer representations. Our method simply pulls the embedding of an instance closer to its nearest neighbors in a search space that is constrained using the additional knowledge. By leveraging this non-contrastive loss, we show that the supervised ImageNet-1k pretraining with our method results in better transfer performance as compared to the baselines. Further, we demonstrate that our method is relatively robust to label noise. Finally, we show that it is possible to use the noisy constraint across modalities to train self-supervised video models.

updated: Tue Oct 19 2021 23:14:23 GMT+0000 (UTC)

published: Tue Oct 19 2021 23:14:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト