Efficient large-scale image retrieval with deep feature orthogonality and Hybrid-Swin-Transformers

Christof Henkel

深い特徴の直交性とHybrid-Swin-Transformersによる効率的な大規模画像検索

大規模なランドマークの認識と検索のための効率的なエンドツーエンドのパイプラインを提示します。画像検索の最近の研究からの概念を組み合わせて強化する方法を示し、大規模なランドマークの識別に特に適した2つのアーキテクチャを紹介します。 EfficientNetバックボーンと新しいHybrid-Swin-Transformerを使用したローカル機能とグローバル機能の深い直交融合（DOLG）を備えたモデルについて説明し、段階的アプローチとサブセンターアークフェースを使用して両方のアーキテクチャを効率的にトレーニングする方法について詳しく説明します。動的マージンのある損失が提供されます。さらに、画像検索のための新しい識別再ランク付け方法論を詳しく説明します。私たちのアプローチの優位性は、Googleランドマークコンペティション2021の認識と検索のトラックを獲得することによって実証されました。

We present an efficient end-to-end pipeline for largescale landmark recognition and retrieval. We show how to combine and enhance concepts from recent research in image retrieval and introduce two architectures especially suited for large-scale landmark identification. A model with deep orthogonal fusion of local and global features (DOLG) using an EfficientNet backbone as well as a novel Hybrid-Swin-Transformer is discussed and details how to train both architectures efficiently using a step-wise approach and a sub-center arcface loss with dynamic margins are provided. Furthermore, we elaborate a novel discriminative re-ranking methodology for image retrieval. The superiority of our approach was demonstrated by winning the recognition and retrieval track of the Google Landmark Competition 2021.

updated: Thu Oct 07 2021 20:41:13 GMT+0000 (UTC)

published: Thu Oct 07 2021 20:41:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト