FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Anwesan Pal; Sahil Wadhwa; Ayush Jaiswal; Xu Zhang; Yue Wu; Rakesh Chada; Pradeep Natarajan; Henrik I. Christensen

FashionNTM: カスケードメモリを介したマルチターンファッション画像検索

マルチターンのテキストフィードバックベースのファッション画像検索は現実世界の設定に焦点を当てており、ユーザーはすべての要件を満たすアイテムが見つかるまで情報を繰り返し提供して検索結果を絞り込むことができます。この研究では、このようなマルチターンシステム用の、FashionNTM と呼ばれる新しいメモリベースの方法を紹介します。私たちのフレームワークには、暗黙的な状態管理のための新しい Cascaded Memory Neural Turing Machine (CM-NTM) アプローチが組み込まれており、これにより、特定のターンで新しい画像を取得するために過去のすべてのターンにわたる情報を統合する方法を学習します。バニラのニューラルチューリングマシン (NTM) とは異なり、当社の CM-NTM は複数の入力で動作し、個別の読み取りおよび書き込みヘッドを介してそれぞれのメモリと対話し、複雑な関係を学習します。広範な評価結果により、私たちが提案した手法は、相対的に 12.6% 向上したことに加え、現在存在する唯一のマルチターンファッションデータセットである Multi-turn FashionIQ において、以前の最先端アルゴリズムよりも 50.5% 優れていることが示されています。マルチターンシューズ -- この作業で作成したシングルターンシューズデータセットの拡張です。現実世界のインタラクティブな設定でのモデルのさらなる分析により、私たちのモデルの 2 つの重要な機能、つまりターン間の記憶保持と、矛盾のないフィードバックのためのターン順序に対する不可知性が実証されました。最後に、ユーザー調査の結果は、FashionNTM によって取得された画像が他のマルチターンモデルより 83.1% に好まれていることを示しています。プロジェクトページ: https://sites.google.com/eng.ucsd.edu/fashionntm

Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntm

updated: Sun Aug 20 2023 05:44:18 GMT+0000 (UTC)

published: Sun Aug 20 2023 05:44:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト