Understanding Dynamics of Nonlinear Representation Learning and Its Application

Kenji Kawaguchi; Linjun Zhang; Zhun Deng

非線形表現学習のダイナミクスとその応用を理解する

世界環境の表現は、人工知能において重要な役割を果たします。画像のピクセル値などの生の感覚表現の空間で直接推論と推論を行うことは、多くの場合非効率的です。表現学習により、生の感覚データから適切な表現を自動的に発見できます。たとえば、生の感覚データが与えられると、ディープニューラルネットワークはその隠れ層で非線形表現を学習し、その後、出力層での分類に使用されます。これは、監視ありまたは監視なしの損失を最小限に抑えることにより、トレーニング中に暗黙的に発生します。この論文では、そのような暗黙の非線形表現学習のダイナミクスを研究します。共通モデル構造の仮定とデータアーキテクチャのアライメント条件と呼ばれる、新しい仮定と新しい条件のペアを特定します。一般的なモデル構造の仮定の下で、データアーキテクチャのアライメント条件は、グローバルな収束に十分であり、グローバルな最適性に必要であることが示されています。さらに、私たちの理論は、ネットワークサイズを増やすと、実際のレジームでのトレーニング動作が改善される場合と改善されない場合について説明しています。私たちの結果は、モデル構造を設計するための実用的なガイダンスを提供します。たとえば、一般的なモデル構造の仮定は、他のモデル構造の代わりに特定のモデル構造を使用する理由として使用できます。また、特定のトレーニングアルゴリズムを自動的に変更することで、データアーキテクチャの調整条件を満たす新しいトレーニングフレームワークを導き出します。標準のトレーニングアルゴリズムを使用すると、修正バージョンを実行するフレームワークは、畳み込み、接続のスキップ、MNIST、CIFAR-10、CIFARなどのデータセットを使用したバッチ正規化を使用して、深い残余ニューラルネットワークのグローバル収束保証を提供しながら、競争力のあるテストパフォーマンスを維持することが経験的に示されます。 -100、Semeion、KMNIST、SVHN。

Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.

updated: Thu Apr 07 2022 03:36:10 GMT+0000 (UTC)

published: Mon Jun 28 2021 16:31:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト