Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

Jiawei Du; Yidi Jiang; Vincent T. F. Tan; Joey Tianyi Zhou; Haizhou Li

データセットの蒸留を改善するために累積された軌道誤差を最小限に抑える

モデルベースの深層学習は、大規模な現実世界のデータを利用できることもあり、驚くべき成功を収めています。ただし、このような大量のデータを処理するには、計算、ストレージ、トレーニング、および優れたニューラルアーキテクチャの検索に関して、かなりのコストがかかります。したがって、データセットの蒸留が最近注目を集めています。このパラダイムには、大規模な実世界のデータセットからの情報を、小さくてコンパクトな合成データセットに抽出して、後者を処理することで前者と同様のパフォーマンスが得られるようにすることが含まれます。最先端の方法は、主に、実際のデータと合成データの間でトレーニング中に取得された勾配を一致させることにより、合成データセットを学習することに依存しています。ただし、これらの勾配マッチング法は、蒸留とその後の評価との間の不一致によって引き起こされる蓄積された軌道誤差に悩まされます。この蓄積された軌道誤差の悪影響を軽減するために、最適化アルゴリズムが平坦な軌道を求めるようにする新しいアプローチを提案します。合成データでトレーニングされた重みは、平坦な軌跡に向かう正則化により、累積誤差の摂動に対してロバストであることを示します。 Flat Trajectory Distillation (FTD) と呼ばれる私たちの方法は、より解像度の高い画像を含む ImageNet データセットの画像のサブセットで、勾配マッチング方法のパフォーマンスを最大 4.7% 向上させることが示されています。また、さまざまな解像度のデータセットを使用してメソッドの有効性と一般化可能性を検証し、ニューラルアーキテクチャ検索への適用性を示します。

Model-based deep learning has achieved astounding successes due in part to the availability of large-scale realworld data. However, processing such massive amounts of data comes at a considerable cost in terms of computations, storage, training and the search for good neural architectures. Dataset distillation has thus recently come to the fore. This paradigm involves distilling information from large real-world datasets into tiny and compact synthetic datasets such that processing the latter yields similar performances as the former. State-of-the-art methods primarily rely on learning the synthetic dataset by matching the gradients obtained during training between the real and synthetic data. However, these gradient-matching methods suffer from the accumulated trajectory error caused by the discrepancy between the distillation and subsequent evaluation. To alleviate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7% on a subset of images of the ImageNet dataset with higher resolution images. We also validate the effectiveness and generalizability of our method with datasets of different resolutions and demonstrate its applicability to neural architecture search.

updated: Sun Nov 20 2022 15:49:11 GMT+0000 (UTC)

published: Sun Nov 20 2022 15:49:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト