Rescaling CNN through Learnable Repetition of Network Parameters

Arnav Chavan; Udbhav Bamba; Rishabh Tiwari; Deepak Gupta

ネットワークパラメータの学習可能な繰り返しによるCNNの再スケーリング

より深く、より広いCNNは、深層学習タスクのパフォーマンスを向上させることが知られています。ただし、このようなネットワークのほとんどは、パラメータの増加ごとのパフォーマンスの向上が不十分です。このホワイトペーパーでは、より深いモデルで観察されたゲインが純粋に最適化パラメータの追加によるものなのか、それともネットワークの物理的なサイズも役割を果たすのかを調査します。さらに、パラメータの学習可能な繰り返しに基づいて、CNNの新しい再スケーリング戦略を提示します。この戦略に基づいて、パラメーター数を変更せずにCNNを再スケーリングし、重みの学習可能な共有自体が、パラメーター数を変更せずに特定のモデルのパフォーマンスを大幅に向上できることを示します。小規模なベースネットワークを再スケーリングすると、より深いネットワークの最適化パラメーターの6％程度で、より深いネットワークに匹敵するパフォーマンスを提供できることを示します。ウェイトシェアリングの関連性は、グループ等価CNNの例を通じてさらに強調されます。分類問題に関する通常のCNNよりもグループ同変CNNで得られた大幅な改善は、部分的には追加された同変特性によるものであり、その一部は学習可能なネットワークの重みの繰り返しによるものであることを示します。 rot-MNISTデータセットの場合、回転の同変性に関する最先端の方法で報告された相対ゲインの最大40％は、実際には学習された重みの繰り返しによるものである可能性があることを示しています。

Deeper and wider CNNs are known to provide improved performance for deep learning tasks. However, most such networks have poor performance gain per parameter increase. In this paper, we investigate whether the gain observed in deeper models is purely due to the addition of more optimization parameters or whether the physical size of the network as well plays a role. Further, we present a novel rescaling strategy for CNNs based on learnable repetition of its parameters. Based on this strategy, we rescale CNNs without changing their parameter count, and show that learnable sharing of weights itself can provide significant boost in the performance of any given model without changing its parameter count. We show that small base networks when rescaled, can provide performance comparable to deeper networks with as low as 6% of optimization parameters of the deeper one. The relevance of weight sharing is further highlighted through the example of group-equivariant CNNs. We show that the significant improvements obtained with group-equivariant CNNs over the regular CNNs on classification problems are only partly due to the added equivariance property, and part of it comes from the learnable repetition of network weights. For rot-MNIST dataset, we show that up to 40% of the relative gain reported by state-of-the-art methods for rotation equivariance could actually be due to just the learnt repetition of weights.

updated: Thu Jan 14 2021 15:03:25 GMT+0000 (UTC)

published: Thu Jan 14 2021 15:03:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト