Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Shiwei Liu; Tianlong Chen; Zahra Atashgahi; Xiaohan Chen; Ghada Sokar; Elena Mocanu; Mykola Pechenizkiy; Zhangyang Wang; Decebal Constantin Mocanu

トレーニングまたはテストのいずれにもオーバーヘッドのないディープアンサンブル：動的スパース性の総合的な祝福

予測パフォーマンス、不確実性の推定、および分布外のロバストネスの改善におけるディープアンサンブルの成功は、機械学習の文献で広く研究されています。有望な結果ではありますが、複数のディープニューラルネットワークを素朴にトレーニングし、推論でそれらの予測を組み合わせると、法外な計算コストとメモリ要件が発生します。最近提案された効率的なアンサンブルアプローチは、大幅に低いコストで従来のディープアンサンブルのパフォーマンスに到達します。ただし、これらのアプローチに必要なトレーニングリソースは、少なくとも単一の高密度モデルのトレーニングと同じです。この作業では、スパースニューラルネットワークトレーニングとディープアンサンブルの間に独自の接続を描画し、FreeTicketsと呼ばれる新しい効率的なアンサンブル学習フレームワークを生成します。複数の密なネットワークをトレーニングして平均化する代わりに、スパースサブネットワークを最初から直接トレーニングし、この効率的なスパースからスパースへのトレーニング中に、多様でありながら正確なサブネットワークを抽出します。私たちのフレームワークであるFreeTicketsは、これらの比較的安価なスパースサブネットワークのアンサンブルとして定義されています。アンサンブル手法であるにもかかわらず、FreeTicketsには、単一の密なモデルよりもさらに少ないパラメーターとトレーニングFLOPがあります。この一見直感に反する結果は、動的スパーストレーニングの超トレーニング/推論効率によるものです。 FreeTicketsは、予測精度、不確実性の推定、分布外（OoD）の堅牢性、およびトレーニングと推論の両方の効率のすべての基準で、高密度のベースラインを上回っています。印象的なことに、FreeTicketsは、ImageNetで必要なトレーニングFLOPの約1/5のみを使用して、ImageNet上のResNet50でナイーブなディープアンサンブルよりも優れています。ソースコードはhttps://github.com/VITA-Group/FreeTicketsでリリースされています。

The success of deep ensembles on improving predictive performance, uncertainty estimation, and out-of-distribution robustness has been extensively studied in the machine learning literature. Albeit the promising results, naively training multiple deep neural networks and combining their predictions at inference leads to prohibitive computational costs and memory requirements. Recently proposed efficient ensemble approaches reach the performance of the traditional deep ensembles with significantly lower costs. However, the training resources required by these approaches are still at least the same as training a single dense model. In this work, we draw a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeTickets. Instead of training multiple dense networks and averaging them, we directly train sparse subnetworks from scratch and extract diverse yet accurate subnetworks during this efficient, sparse-to-sparse training. Our framework, FreeTickets, is defined as the ensemble of these relatively cheap sparse subnetworks. Despite being an ensemble method, FreeTickets has even fewer parameters and training FLOPs than a single dense model. This seemingly counter-intuitive outcome is due to the ultra training/inference efficiency of dynamic sparse training. FreeTickets surpasses the dense baseline in all the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, as well as efficiency for both training and inference. Impressively, FreeTickets outperforms the naive deep ensemble with ResNet50 on ImageNet using around only 1/5 of the training FLOPs required by the latter. We have released our source code at https://github.com/VITA-Group/FreeTickets.

updated: Mon Feb 07 2022 12:21:13 GMT+0000 (UTC)

published: Mon Jun 28 2021 10:48:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト