GradMax: Growing Neural Networks using Gradient Information

Utku Evci; Bart van Merriënboer; Thomas Unterthiner; Max Vladymyrov; Fabian Pedregosa

GradMax：勾配情報を使用したニューラルネットワークの成長

ニューラルネットワークのアーキテクチャとパラメータは、多くの場合、独立して最適化されます。これには、アーキテクチャが変更されるたびに、コストのかかるパラメータの再トレーニングが必要になります。この作業では、代わりに、コストのかかる再トレーニングを必要とせずにアーキテクチャを拡張することに焦点を当てます。トレーニングのダイナミクスを改善しながら、すでに学習した内容に影響を与えることなく、トレーニング中に新しいニューロンを追加する方法を紹介します。新しい重みの勾配を最大化することによって後者を達成し、特異値分解（SVD）によって効率的に最適な初期化を見つけます。この手法をGradientMaximizing Growth（GradMax）と呼び、さまざまなビジョンタスクとアーキテクチャでその有効性を示します。

The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.

updated: Wed Feb 23 2022 00:42:09 GMT+0000 (UTC)

published: Thu Jan 13 2022 18:30:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト