SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts

Masanari Kimura; Takuma Nakamura; Yuki Saito

SHIFT15M：分布シフトを伴う多目的大規模ファッションデータセット

多くの機械学習アルゴリズムは、トレーニングデータとテストデータが同じ分布に従うことを前提としています。ただし、このような仮定は、実際の機械学習の問題ではしばしば違反されます。本稿では、トレーニングとテストの間でデータの分布が変化する状況でモデルを適切に評価するために使用できるデータセットであるSHIFT15Mを提案します。 SHIFT15Mデータセットには、いくつかの優れた特性があります。（i）多目的。データセット内の各インスタンスには、ターゲット変数として使用できるいくつかの数値があります。（ii）大規模。 SHIFT15Mデータセットは、1500万枚のファッション画像で構成されています。（iii）データセットシフトのタイプのカバレッジ。 SHIFT15Mには、複数のデータセットシフト問題設定（共変量シフトやターゲットシフトなど）が含まれています。 SHIFT15Mは、大きさを切り替えることにより、データセットシフトのさまざまな大きさでのモデルのパフォーマンス評価も可能にします。さらに、SHIFT15Mを非常に簡単な方法で処理するソフトウェアを提供しています：https：//github.com/st-tech/zozo-shift15m。

Many machine learning algorithms assume that the training data and the test data follow the same distribution. However, such assumptions are often violated in real-world machine learning problems. In this paper, we propose SHIFT15M, a dataset that can be used to properly evaluate models in situations where the distribution of data changes between training and testing. The SHIFT15M dataset has several good properties: (i) Multiobjective. Each instance in the dataset has several numerical values that can be used as target variables. (ii) Large-scale. The SHIFT15M dataset consists of 15million fashion images. (iii) Coverage of types of dataset shifts. SHIFT15M contains multiple dataset shift problem settings (e.g., covariate shift or target shift). SHIFT15M also enables the performance evaluation of the model under various magnitudes of dataset shifts by switching the magnitude. In addition, we provide software to handle SHIFT15M in a very simple way: https://github.com/st-tech/zozo-shift15m.

updated: Mon Aug 30 2021 05:07:59 GMT+0000 (UTC)

published: Mon Aug 30 2021 05:07:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト