Understanding Dynamics of Nonlinear Representation Learning and Its Application

Kenji Kawaguchi; Linjun Zhang; Zhun Deng

非線形表現学習のダイナミクスとその応用を理解する

世界環境の表現は、人工知能において重要な役割を果たします。画像のピクセル値などの生の感覚表現の空間で直接推論と推論を行うことは、多くの場合非効率的です。表現学習により、生の感覚データから適切な表現を自動的に発見できます。たとえば、生の感覚データが与えられると、ディープニューラルネットワークはその隠れ層で非線形表現を学習し、その後、その出力層で分類（または回帰）に使用されます。これは、ニューラルタンジェントカーネル（NTK）レジームとは異なり、深層学習の一般的な実践レジームで教師ありまたは教師なし損失を最小限に抑えることにより、トレーニング中に暗黙的に発生します。この論文では、NTKレジームを超えたそのような暗黙の非線形表現学習のダイナミクスを研究します。共通モデル構造の仮定とデータアーキテクチャのアライメント条件と呼ばれる、新しい仮定と新しい条件のペアを特定します。一般的なモデル構造の仮定の下で、データアーキテクチャのアライメント条件は、グローバルな収束に十分であり、グローバルな最適性に必要であることが示されています。さらに、私たちの理論は、ネットワークサイズを増やすと、実際のレジームでのトレーニング動作が改善される場合と改善されない場合について説明しています。私たちの結果は、モデル構造を設計するための実用的なガイダンスを提供します。たとえば、一般的なモデル構造の仮定は、他のモデル構造の代わりに特定のモデル構造を使用する理由として使用できます。また、理論に基づいて新しいトレーニングフレームワークを導き出します。提案されたフレームワークは、畳み込み、スキップ接続、およびCIFAR-10、CIFAR-100、SVHNなどの標準ベンチマークデータセットを使用したバッチ正規化を備えた深い残余ニューラルネットワークのグローバル収束保証を提供しながら、競争力のある（実用的な）テストパフォーマンスを維持することが経験的に示されています。

Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss in common practical regimes of deep learning, unlike the neural tangent kernel (NTK) regime. In this paper, we study the dynamics of such implicit nonlinear representation learning, which is beyond the NTK regime. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework based on the theory. The proposed framework is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with standard benchmark datasets, including CIFAR-10, CIFAR-100, and SVHN.

updated: Sat Apr 09 2022 05:54:02 GMT+0000 (UTC)

published: Mon Jun 28 2021 16:31:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト