COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features

Jia-Qi Yang; Chenglei Dai; OU Dan; Ju Huang; De-Chuan Zhan; Qingwen Liu; Xiaoyi Zeng; Yang Yang

COURIER: 画像特徴の大規模事前トレーニングのための対照的なユーザー意図の再構成

マルチメディアインターネットの発展に伴い、視覚的特徴はユーザーの興味に影響を与える重要な要素となっています。したがって、視覚的な機能を組み込むことは、クリックスルー率 (CTR) 予測のパフォーマンスをさらに向上させるための有望な方向性です。ただし、確立された事前トレーニング方法でトレーニングされた画像埋め込みを単に注入するだけでは、わずかな改善しか得られないことがわかりました。この失敗の理由は次の 2 つであると考えられます。 1 つ目は、事前トレーニング方法は、意味論的な特徴に集中する明確に定義されたコンピュータービジョンタスク用に設計されており、レコメンデーションに対する個人的な関心を学習できないことです。第 2 に、CTR 予測タスクの入力としてカテゴリや項目タイトルなどのセマンティック特徴がすでにあることを考慮すると、セマンティック情報のみを含む事前トレーニングされた画像埋め込みでは情報の利得がほとんどありません。さらなる改善には、推奨に合わせた事前トレーニング方法が必要であると主張します。この目的を達成するために、ユーザーのクリック履歴から視覚的特徴を学習できる、推奨を意識した画像の事前トレーニング方法を提案します。具体的には、行動履歴からユーザーの興味に関連する視覚的特徴をマイニングするユーザー興味再構築モジュールを提案します。さらに、埋め込みベクトルの崩壊を回避するための対照的なトレーニング方法を提案します。私たちは広範な実験を行って、私たちの方法がユーザーの視覚的興味を学習できることを検証し、私たちの方法は、p値<0.01でオフラインAUCで0.46％の改善、タオバオオンラインGMVで0.88％の改善を達成しました。

With the development of the multi-media internet, visual characteristics have become an important factor affecting user interests. Thus, incorporating visual features is a promising direction for further performance improvements in click-through rate (CTR) prediction. However, we found that simply injecting the image embeddings trained with established pre-training methods only has marginal improvements. We attribute the failure to two reasons: First, The pre-training methods are designed for well-defined computer vision tasks concentrating on semantic features, and they cannot learn personalized interest in recommendations. Secondly, pre-trained image embeddings only containing semantic information have little information gain, considering we already have semantic features such as categories and item titles as inputs in the CTR prediction task. We argue that a pre-training method tailored for recommendation is necessary for further improvements. To this end, we propose a recommendation-aware image pre-training method that can learn visual features from user click histories. Specifically, we propose a user interest reconstruction module to mine visual features related to user interests from behavior histories. We further propose a contrastive training method to avoid collapsing of embedding vectors. We conduct extensive experiments to verify that our method can learn users' visual interests, and our method achieves 0.46% improvement in offline AUC and 0.88% improvement in Taobao online GMV with p-value<0.01.

updated: Thu Jun 08 2023 07:45:24 GMT+0000 (UTC)

published: Thu Jun 08 2023 07:45:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト