Efficient Large-Scale Visual Representation Learning And Evaluation

Eden Dolev; Alaa Awad; Denisa Roberts; Zahra Ebrahimzadeh; Marcin Mejran; Vaibhav Malpani; Mahir Yavuz

大規模な視覚表現の効率的な学習と評価

この記事では、単一モダリティの視覚表現学習へのアプローチを紹介します。電子商取引におけるファッションのレコメンデーションには、アイテムの視覚的表現を理解することが不可欠です。畳み込みニューラルネットワークとビジョントランスフォーマーファミリの両方において、いくつかの事前学習済みバックボーンアーキテクチャを含む、低リソース設定下で効率的な方法で大規模な視覚表現学習モデルを微調整するために使用される手法を詳しく説明し、対比します。大規模な電子商取引アプリケーションの課題について説明し、視覚的表現をより効率的にトレーニング、評価、提供するための取り組みに焦点を当てます。我々は、モバイルデバイス上での視覚的に類似した広告の推奨を含む、いくつかの下流タスクのオフライン表現パフォーマンスを評価するアブレーション研究を紹介します。この目的を達成するために、視覚的に類似した推奨システムのための、新しい多言語テキストから画像への生成オフライン評価方法を提案します。最後に、Etsy の実稼働環境に導入された機械学習システムのオンライン結果も含めます。

In this article, we present our approach to single-modality visual representation learning. Understanding visual representations of items is vital for fashion recommendations in e-commerce. We detail and contrast techniques used to finetune large-scale visual representation learning models in an efficient manner under low-resource settings, including several pretrained backbone architectures, both in the convolutional neural network as well as the vision transformer family. We describe the challenges for e-commerce applications at-scale and highlight the efforts to more efficiently train, evaluate, and serve visual representations. We present ablation studies evaluating the representation offline performance for several downstream tasks, including visually similar ad recommendations on mobile devices. To this end, we present a novel multilingual text-to-image generative offline evaluation method for visually similar recommendation systems. Finally, we include online results from deployed machine learning systems in production at Etsy.

updated: Mon Jul 17 2023 22:31:33 GMT+0000 (UTC)

published: Mon May 22 2023 18:25:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト