Digging Into Self-Supervised Learning of Feature Descriptors

Iaroslav Melekhov; Zakaria Laskar; Xiaotian Li; Shuzhe Wang; Juho Kannala

特徴記述子の自己教師あり学習を掘り下げる

ローカル画像記述子を学習するための完全に監視されたCNNベースのアプローチは、幅広い幾何学的タスクで注目に値する結果を示しています。ただし、それらのほとんどは、大規模に取得するのが難しいピクセルごとのグラウンドトゥルースキーポイント対応データを必要とします。この課題に対処するために、最近の弱くおよび自己監視された方法は、相対的なカメラのポーズから、またはホモグラフィなどの合成剛体変換のみを使用して、特徴記述子を学習できます。この作業では、既存の自己監視アプローチの制限を理解することに焦点を当て、強力な機能記述子につながる一連の改善を提案します。ハードネガティブマイニングの検索スペースをペアからバッチに増やすと、一貫した改善がもたらされることを示します。特徴記述子の識別性を高めるために、グローバルな視覚的画像記述子を使用して、より広い検索空間からローカルハードネガをマイニングするための粗い方法から細かい方法を提案します。合成ホモグラフィ変換、色の増強、およびフォトリアリスティックな画像の様式化の組み合わせが、視点と照明に不変である有用な表現を生成することを示します。提案されたアプローチによって学習された特徴記述子は、競合的に実行され、画像ベースのローカリゼーション、スパース特徴マッチング、画像検索などのさまざまな幾何学的ベンチマークで、完全に監視されたものと弱く監視されたものを上回ります。

Fully-supervised CNN-based approaches for learning local image descriptors have shown remarkable results in a wide range of geometric tasks. However, most of them require per-pixel ground-truth keypoint correspondence data which is difficult to acquire at scale. To address this challenge, recent weakly- and self-supervised methods can learn feature descriptors from relative camera poses or using only synthetic rigid transformations such as homographies. In this work, we focus on understanding the limitations of existing self-supervised approaches and propose a set of improvements that combined lead to powerful feature descriptors. We show that increasing the search space from in-pair to in-batch for hard negative mining brings consistent improvement. To enhance the discriminativeness of feature descriptors, we propose a coarse-to-fine method for mining local hard negatives from a wider search space by using global visual image descriptors. We demonstrate that a combination of synthetic homography transformation, color augmentation, and photorealistic image stylization produces useful representations that are viewpoint and illumination invariant. The feature descriptors learned by the proposed approach perform competitively and surpass their fully- and weakly-supervised counterparts on various geometric benchmarks such as image-based localization, sparse feature matching, and image retrieval.

updated: Sun Oct 10 2021 12:22:44 GMT+0000 (UTC)

published: Sun Oct 10 2021 12:22:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト