IRGen: Generative Modeling for Image Retrieval

Yidan Zhang; Ting Zhang; Dong Chen; Yujing Wang; Qi Chen; Xing Xie; Hao Sun; Weiwei Deng; Qi Zhang; Fan Yang; Mao Yang; Qingmin Liao; Baining Guo

IRGen: 画像検索のための生成モデリング

ジェネレーティブモデリングは、自然言語処理とコンピュータービジョンの分野で広く普及していますが、画像検索への応用は未開拓のままです。この論文では、シーケンスからシーケンスへのモデルを採用することにより、生成モデリングの形式として画像検索を再キャストし、現在の統一されたテーマに貢献しています。私たちのフレームワークである IRGen は、エンドツーエンドの微分可能検索を可能にする統合モデルであり、直接最適化により優れたパフォーマンスを実現します。 IRGen の開発中に、効率的かつ効果的な検索を可能にするために、画像を意味単位の非常に短いシーケンスに変換するという重要な技術的課題に取り組んでいます。経験的実験は、私たちのモデルが 3 つの一般的に使用されるベンチマークよりも大幅に改善されることを示しています。たとえば、同等の再現率 @10 スコアを持つ店舗内データセットの精度 @10 で、最適なベースラインメソッドよりも 22.9% 高くなります。

While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored. In this paper, we recast image retrieval as a form of generative modeling by employing a sequence-to-sequence model, contributing to the current unified theme. Our framework, IRGen, is a unified model that enables end-to-end differentiable search, thus achieving superior performance thanks to direct optimization. While developing IRGen we tackle the key technical challenge of converting an image into quite a short sequence of semantic units in order to enable efficient and effective retrieval. Empirical experiments demonstrate that our model yields significant improvement over three commonly used benchmarks, for example, 22.9% higher than the best baseline method in precision@10 on In-shop dataset with comparable recall@10 score.

updated: Wed Jun 28 2023 13:28:06 GMT+0000 (UTC)

published: Fri Mar 17 2023 17:07:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト