DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

Min Yang; Dongliang He; Miao Fan; Baorong Shi; Xuetong Xue; Fu Li; Errui Ding; Jizhou Huang

DOLG：ローカル機能とグローバル機能の深い直交融合による単一ステージの画像検索

画像検索は、データベースからクエリと同様の画像を取得する基本的なタスクです。一般的な画像検索の方法は、最初にグローバル画像特徴を使用した類似性検索を介して候補画像を検索し、次にローカル特徴を活用して候補を再ランク付けすることです。以前の学習ベースの研究は、主に検索タスクに取り組むためのグローバルまたはローカルの画像表現学習に焦点を当てています。この論文では、2段階のパラダイムを放棄し、画像内のローカル情報とグローバル情報をコンパクトな画像表現に統合することにより、効果的な1段階のソリューションを設計しようとしています。具体的には、エンドツーエンドの画像検索のためのディープ直交ローカルおよびグローバル（DOLG）情報融合フレームワークを提案します。最初は、多面的な畳み込みと自己注意を使用して、代表的なローカル情報を注意深く抽出します。次に、グローバル画像表現に直交するコンポーネントがローカル情報から抽出されます。最後に、直交成分が補完としてグローバル表現と連結され、次に集計が実行されて最終的な表現が生成されます。フレームワーク全体はエンドツーエンドで区別可能であり、画像レベルのラベルでトレーニングできます。広範な実験結果は、私たちのソリューションの有効性を検証し、私たちのモデルが再訪されたオックスフォードとパリのデータセットで最先端の画像検索パフォーマンスを達成することを示しています。

Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tackle the retrieval task. In this paper, we abandon the two-stage paradigm and seek to design an effective single-stage solution by integrating local and global information inside images into compact image representations. Specifically, we propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval. It attentively extracts representative local information with multi-atrous convolutions and self-attention at first. Components orthogonal to the global image representation are then extracted from the local information. At last, the orthogonal components are concatenated with the global representation as a complementary, and then aggregation is performed to generate the final representation. The whole framework is end-to-end differentiable and can be trained with image-level labels. Extensive experimental results validate the effectiveness of our solution and show that our model achieves state-of-the-art image retrieval performances on Revisited Oxford and Paris datasets.

updated: Wed Aug 11 2021 13:29:58 GMT+0000 (UTC)

published: Fri Aug 06 2021 03:14:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト