OVO: One-shot Vision Transformer Search with Online distillation

Zimian Wei; Hengyue Pan; Xin Niu; Dongsheng Li

OVO: オンライン蒸留によるワンショットビジョントランスフォーマー検索

純粋な変圧器は、最近、視覚タスクに大きな可能性を示しています。ただし、小規模または中規模のデータセットでの精度は十分ではありません。一部の既存の方法では、CNN を教師として導入し、トレーニングプロセスを蒸留によって導きますが、教師と生徒のネットワーク間のギャップにより、最適なパフォーマンスが得られません。この作業では、オンライン蒸留、つまり OVO を使用した新しいワンショットビジョントランスフォーマー検索フレームワークを提案します。 OVO は、より良い蒸留結果を得るために、教師と生徒の両方のネットワークのサブネットをサンプリングします。オンライン蒸留の恩恵を受けて、スーパーネット内の何千ものサブネットが、追加の微調整や再トレーニングなしで十分にトレーニングされています。実験では、OVO-Ti は ImageNet で 73.32%、CIFAR-100 で 75.2% のトップ 1 精度をそれぞれ達成しています。

Pure transformers have shown great potential for vision tasks recently. However, their accuracy in small or medium datasets is not satisfactory. Although some existing methods introduce a CNN as a teacher to guide the training process by distillation, the gap between teacher and student networks would lead to sub-optimal performance. In this work, we propose a new One-shot Vision transformer search framework with Online distillation, namely OVO. OVO samples sub-nets for both teacher and student networks for better distillation results. Benefiting from the online distillation, thousands of subnets in the supernet are well-trained without extra finetuning or retraining. In experiments, OVO-Ti achieves 73.32% top-1 accuracy on ImageNet and 75.2% on CIFAR-100, respectively.

updated: Fri Nov 24 2023 08:11:59 GMT+0000 (UTC)

published: Wed Dec 28 2022 10:08:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト