Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback

Yifei Yuan; Wai Lam

マルチターン自然言語フィードバックによる会話型ファッション画像検索

マルチターン自然言語フィードバックを介した会話形式のファッション画像検索のタスクを研究します。これまでの研究のほとんどは、シングルターン設定に基づいています。マルチターンの会話型ファッション画像検索に関する既存のモデルには、従来のモデルを使用するなどの制限があり、非効率的なパフォーマンスにつながります。マルチターン自然言語フィードバックテキストを使用して、会話形式のファッション画像検索を効果的に処理できる新しいフレームワークを提案します。このフレームワークの特徴の 1 つは、エンコードされた参照画像とフィードバックテキスト情報を会話履歴とともに利用して、候補画像を検索することです。さらに、画像のファッション属性情報は、相互注意戦略を介して活用されます。私たちのタスクのマルチターン設定に適した既存のファッションデータセットがないため、既存のシングルターンデータセットに手動で注釈を追加することで、大規模なマルチターンファッションデータセットを派生させます。実験は、提案されたモデルが既存の最先端の方法を大幅に上回ることを示しています。

We study the task of conversational fashion image retrieval via multiturn natural language feedback. Most previous studies are based on single-turn settings. Existing models on multiturn conversational fashion image retrieval have limitations, such as employing traditional models, and leading to ineffective performance. We propose a novel framework that can effectively handle conversational fashion image retrieval with multiturn natural language feedback texts. One characteristic of the framework is that it searches for candidate images based on exploitation of the encoded reference image and feedback text information together with the conversation history. Furthermore, the image fashion attribute information is leveraged via a mutual attention strategy. Since there is no existing fashion dataset suitable for the multiturn setting of our task, we derive a large-scale multiturn fashion dataset via additional manual annotation efforts on an existing single-turn dataset. The experiments show that our proposed model significantly outperforms existing state-of-the-art methods.

updated: Tue Jun 08 2021 06:34:25 GMT+0000 (UTC)

published: Tue Jun 08 2021 06:34:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト