Exploiting Invariance in Training Deep Neural Networks

Chengxi Ye; Xiong Zhou; Tristan McKinney; Yanfeng Liu; Qinggang Zhou; Fedor Zhdanov

ディープニューラルネットワークのトレーニングにおける不変性の活用

動物の視覚系の2つの基本的なメカニズムに触発されて、深いニューラルネットワークのトレーニングに不変特性を課す特徴変換手法を紹介します。結果として得られるアルゴリズムは、必要なパラメーターの調整が少なく、初期学習率1.0で適切にトレーニングされ、さまざまなタスクに簡単に一般化できます。データ内のローカル統計を使用してスケール不変性を適用し、類似したサンプルをさまざまなスケールで整列させます。収束を加速するために、バッチから抽出されたグローバル統計を使用してGL（n）不変プロパティを適用し、最急降下法の解が基底変換の下で不変のままになるようにします。プロファイリング分析は、提案された変更が、基礎となる畳み込み層の計算の5％を占めることを示しています。畳み込みネットワークとトランスフォーマーネットワークでテストされた提案手法は、トレーニングに必要な反復回数が少なく、すべてのベースラインを大幅に上回り、バッチサイズの小規模トレーニングと大規模トレーニングの両方でシームレスに機能し、さまざまなコンピュータービジョンと言語タスクに適用されます。

Inspired by two basic mechanisms in animal visual systems, we introduce a feature transform technique that imposes invariance properties in the training of deep neural networks. The resulting algorithm requires less parameter tuning, trains well with an initial learning rate 1.0, and easily generalizes to different tasks. We enforce scale invariance with local statistics in the data to align similar samples at diverse scales. To accelerate convergence, we enforce a GL(n)-invariance property with global statistics extracted from a batch such that the gradient descent solution should remain invariant under basis change. Profiling analysis shows our proposed modifications takes 5% of the computations of the underlying convolution layer. Tested on convolutional networks and transformer networks, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision and language tasks.

updated: Thu Dec 09 2021 18:55:30 GMT+0000 (UTC)

published: Tue Mar 30 2021 19:18:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト