Ranking-aware Uncertainty for Text-guided Image Retrieval

Junyang Chen; Hanjiang Lai

テキストガイドによる画像検索におけるランキングを意識した不確実性

テキストガイドによる画像検索では、条件付きテキストを組み込んでユーザーの意図をより適切に捉えることができます。従来、既存の方法は、提供された 3 つの要素 (ソース画像、ソーステキスト、ターゲット画像) を使用して、ソース入力とターゲット画像の間の埋め込み距離を最小限に抑えることに重点を置いていました。ただし、このようなトリプレットの最適化は、より詳細なランキング情報を取得するための学習済み検索モデルを制限する可能性があります。たとえば、トリプレットは 1 対 1 の対応であり、フィードバック言語や画像の意味論的多様性から生じる多対多の対応を考慮できません。。より多くのランキング情報を取得するために、提供されたトリプレットのみを使用して多対多の対応をモデル化する、新しいランキングを意識した不確実性アプローチを提案します。特徴量の確率的ランキングリストを学習する不確実性学習を導入します。具体的には、私たちのアプローチは主に 3 つの要素で構成されます。(1) サンプル内の不確実性。結合された特徴とターゲットの特徴の両方から導出されたガウス分布を使用して意味の多様性を捉えることを目的としています。 (2) サンプル間の不確実性。他のサンプルの分布からランキング情報をさらにマイニングします。 (3) 分布正則化。ソース入力とターゲット画像の分布表現を調整します。既存の最先端の方法と比較して、私たちが提案した方法は、合成画像検索用の 2 つの公開データセットで重要な結果を達成しました。

Text-guided image retrieval is to incorporate conditional text to better capture users' intent. Traditionally, the existing methods focus on minimizing the embedding distances between the source inputs and the targeted image, using the provided triplets ⟨source image, source text, target image⟩. However, such triplet optimization may limit the learned retrieval model to capture more detailed ranking information, e.g., the triplets are one-to-one correspondences and they fail to account for many-to-many correspondences arising from semantic diversity in feedback languages and images. To capture more ranking information, we propose a novel ranking-aware uncertainty approach to model many-to-many correspondences by only using the provided triplets. We introduce uncertainty learning to learn the stochastic ranking list of features. Specifically, our approach mainly comprises three components: (1) In-sample uncertainty, which aims to capture semantic diversity using a Gaussian distribution derived from both combined and target features; (2) Cross-sample uncertainty, which further mines the ranking information from other samples' distributions; and (3) Distribution regularization, which aligns the distributional representations of source inputs and targeted image. Compared to the existing state-of-the-art methods, our proposed method achieves significant results on two public datasets for composed image retrieval.

updated: Wed Aug 16 2023 03:48:19 GMT+0000 (UTC)

published: Wed Aug 16 2023 03:48:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト