Bayesian Optimization Meets Self-Distillation

HyunJae Lee; Heon Song; Hyeonsoo Lee; Gi-hyeon Lee; Suyeong Park; Donggeun Yoo

ベイジアン最適化と自己蒸留の出会い

ベイジアン最適化 (BO) は、複数のトレーニング試行からの観察に基づいて、有望なハイパーパラメーター構成を繰り返し提案することにより、モデルのパフォーマンスの向上に大きく貢献してきました。ただし、以前の試行からの部分的な知識 (つまり、トレーニングされたモデルとそのハイパーパラメーター構成の測定されたパフォーマンス) のみが転送されます。一方、自己蒸留 (SD) は、タスクモデル自体によって学習された部分的な知識のみを転送します。すべてのトレーニングトライアルで得られたさまざまな知識を最大限に活用するために、BOとSDを組み合わせたBOSSフレームワークを提案します。 BOSS は、BO を通じて有望なハイパーパラメーター構成を提案し、SD の以前の試行から事前にトレーニングされたモデルを慎重に選択します。これは、そうでなければ、従来の BO プロセスでは放棄されます。 BOSS は、一般的な画像分類、ノイズの多いラベルを使用した学習、半教師あり学習、医用画像解析タスクなどの幅広いタスクで、BO と SD の両方よりも大幅に優れたパフォーマンスを達成します。

Bayesian optimization (BO) has contributed greatly to improving model performance by suggesting promising hyperparameter configurations iteratively based on observations from multiple training trials. However, only partial knowledge (i.e., the measured performances of trained models and their hyperparameter configurations) from previous trials is transferred. On the other hand, Self-Distillation (SD) only transfers partial knowledge learned by the task model itself. To fully leverage the various knowledge gained from all training trials, we propose the BOSS framework, which combines BO and SD. BOSS suggests promising hyperparameter configurations through BO and carefully selects pre-trained models from previous trials for SD, which are otherwise abandoned in the conventional BO process. BOSS achieves significantly better performance than both BO and SD in a wide range of tasks including general image classification, learning with noisy labels, semi-supervised learning, and medical image analysis tasks.

updated: Mon Aug 28 2023 04:43:57 GMT+0000 (UTC)

published: Tue Apr 25 2023 09:12:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト