Intra-Model Collaborative Learning of Neural Networks

Shijie Fang; Tong Lin

ニューラルネットワークのモデル内協調学習

最近、SongとChaiによって提案された共同学習は、複数の分類器ヘッドを同時にトレーニングすることにより、画像分類タスクの顕著な改善を達成しました。ただし、このようなマルチヘッド構造に必要な膨大なメモリフットプリントは、大容量のベースラインモデルのトレーニングを妨げる可能性があります。当然の問題は、モジュールを複製せずに、単一のネットワーク内で共同学習を実現する方法です。このホワイトペーパーでは、エンジニアリングの労力を最小限に抑えながら、単一ネットワークのさまざまな部分で共同学習を行う4つの方法を提案します。ネットワークの堅牢性を向上させるために、共同学習フレームワークの下でトレーニングするために、出力層と中間層の一貫性を活用します。さらに、中間表現と畳み込みカーネルの類似性も導入され、ニューラルネットワークの冗長性の削減を削減します。 Song and Chaiの方法と比較して、私たちのフレームワークは、単一モデル内のコラボレーションをさらに考慮し、オーバーヘッドが少なくて済みます。 Cifar-10、Cifar-100、ImageNet32、およびSTL-10での広範な実験により、これら4つの方法の有効性が個別に裏付けられ、それらを組み合わせることでさらなる改善がもたらされます。特に、STL-10データセットのテストエラーは、ResNet-18とVGG-16でそれぞれ9.28％と5.45％減少しています。さらに、私たちの方法は、Cifar-10データセットでの実験でノイズをラベル付けするために堅牢であることが証明されています。たとえば、私たちの方法では、50％のノイズ比設定で3.53％高いパフォーマンスが得られます。

Recently, collaborative learning proposed by Song and Chai has achieved remarkable improvements in image classification tasks by simultaneously training multiple classifier heads. However, huge memory footprints required by such multi-head structures may hinder the training of large-capacity baseline models. The natural question is how to achieve collaborative learning within a single network without duplicating any modules. In this paper, we propose four ways of collaborative learning among different parts of a single network with negligible engineering efforts. To improve the robustness of the network, we leverage the consistency of the output layer and intermediate layers for training under the collaborative learning framework. Besides, the similarity of intermediate representation and convolution kernel is also introduced to reduce the reduce redundant in a neural network. Compared to the method of Song and Chai, our framework further considers the collaboration inside a single model and takes smaller overhead. Extensive experiments on Cifar-10, Cifar-100, ImageNet32 and STL-10 corroborate the effectiveness of these four ways separately while combining them leads to further improvements. In particular, test errors on the STL-10 dataset are decreased by 9.28% and 5.45% for ResNet-18 and VGG-16 respectively. Moreover, our method is proven to be robust to label noise with experiments on Cifar-10 dataset. For example, our method has 3.53% higher performance under 50% noise ratio setting.

updated: Thu May 20 2021 08:30:33 GMT+0000 (UTC)

published: Thu May 20 2021 08:30:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト