Cycled Compositional Learning between Images and Text

Jongseok Kim; Youngjae Yu; Seunghwan Lee; GunheeKim

画像とテキストの間の循環構成学習

画像テキスト埋め込みの構成の意味的距離を測定できるCycledCompositionNetworkという名前のアプローチを提示します。まず、コンポジションネットワークは、相対キャプションを使用して、埋め込みスペース内のターゲット画像に参照画像を転送します。次に、Correction Networkは、埋め込みスペース内の参照画像と取得されたターゲット画像の差を計算し、相対的なキャプションと照合します。私たちの目標は、コンポジションネットワークを使用したコンポジションマッピングを学習することです。この一方向マッピングは非常に制約が少ないため、補正ネットワークを使用した逆関係学習と組み合わせて、特定の画像の循環関係を導入します。FashionIQ2020チャレンジに参加し、私たちのアンサンブルで1位を獲得しました。モデル。

We present an approach named the Cycled Composition Network that can measure the semantic distance of the composition of image-text embedding. First, the Composition Network transit a reference image to target image in an embedding space using relative caption. Second, the Correction Network calculates a difference between reference and retrieved target images in the embedding space and match it with a relative caption. Our goal is to learn a Composition mapping with the Composition Network. Since this one-way mapping is highly under-constrained, we couple it with an inverse relation learning with the Correction Network and introduce a cycled relation for given Image We participate in Fashion IQ 2020 challenge and have won the first place with the ensemble of our model.

updated: Sat Jul 24 2021 01:59:11 GMT+0000 (UTC)

published: Sat Jul 24 2021 01:59:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト