Towards Trustworthy Dataset Distillation

Shijie Ma; Fei Zhu; Zhen Cheng; Xu-Yao Zhang

信頼できるデータセットの抽出に向けて

深層学習を現実世界のアプリケーションに適用する場合、効率と信頼性は永遠の追求です。効率に関しては、データセット蒸留 (DD) は、大規模なデータセットを小さな合成データセットに蒸留することでトレーニングコストを削減しようとします。しかし、既存の手法は、閉じられた世界の設定における分布内 (InD) 分類にのみ焦点を当てており、分布外 (OOD) サンプルは無視されています。一方、OOD 検出はモデルの信頼性を高めることを目的としていますが、フルデータ設定では常に非効率的に達成されます。初めて、両方の問題を同時に検討し、Trustworthy Dataset Distillation (TrustDD) と呼ばれる新しいパラダイムを提案します。 InD サンプルと外れ値の両方を抽出することにより、凝縮されたデータセットは、InD 分類と OOD 検出の両方に適したモデルをトレーニングできます。実際の外れ値データの要件を緩和し、OOD 検出をより実用的にするために、InD サンプルを破損して疑似外れ値を生成し、疑似外れ値露出 (POE) を導入することをさらに提案します。さまざまな設定での包括的な実験により、TrustDD の有効性が実証され、提案された POE は最先端の手法である Outlier Exposure (OE) を上回っています。前述の DD と比較して、TrustDD はより信頼性が高く、実際のオープンワールドシナリオに適用できます。私たちのコードは公開される予定です。

Efficiency and trustworthiness are two eternal pursuits when applying deep learning in real-world applications. With regard to efficiency, dataset distillation (DD) endeavors to reduce training costs by distilling the large dataset into a tiny synthetic dataset. However, existing methods merely concentrate on in-distribution (InD) classification in a closed-world setting, disregarding out-of-distribution (OOD) samples. On the other hand, OOD detection aims to enhance models' trustworthiness, which is always inefficiently achieved in full-data settings. For the first time, we simultaneously consider both issues and propose a novel paradigm called Trustworthy Dataset Distillation (TrustDD). By distilling both InD samples and outliers, the condensed datasets are capable to train models competent in both InD classification and OOD detection. To alleviate the requirement of real outlier data and make OOD detection more practical, we further propose to corrupt InD samples to generate pseudo-outliers and introduce Pseudo-Outlier Exposure (POE). Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). Compared with the preceding DD, TrustDD is more trustworthy and applicable to real open-world scenarios. Our code will be publicly available.

updated: Tue Jul 18 2023 11:43:01 GMT+0000 (UTC)

published: Tue Jul 18 2023 11:43:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト