Understanding Dynamics of Nonlinear Representation Learning and Its Application

Kenji Kawaguchi; Linjun Zhang; Zhun Deng

非線形表現学習のダイナミクスとその応用を理解する

世界環境の表現は、機械知能において重要な役割を果たします。画像のピクセル値など、生の感覚表現の空間で直接推論と推論を行うことは、多くの場合非効率的です。表現学習により、生の感覚データから適切な表現を自動的に発見できます。たとえば、生の感覚データが与えられると、多層パーセプトロンはその隠れ層で非線形表現を学習し、その後、その出力層で分類（または回帰）に使用されます。これは、監視ありまたは監視なしの損失を最小限に抑えることにより、トレーニング中に暗黙的に発生します。この論文では、そのような暗黙の非線形表現学習のダイナミクスを研究します。共通モデル構造の仮定とデータアーキテクチャのアラインメント条件と呼ばれる、新しい仮定と新しい条件のペアを特定します。一般的なモデル構造の仮定の下で、データアーキテクチャのアライメント条件は、グローバルな収束に十分であり、グローバルな最適性に必要であることが示されています。私たちの結果は、モデル構造を設計するための実用的なガイダンスを提供します。たとえば、一般的なモデル構造の仮定は、他のモデル構造の代わりに特定のモデル構造を使用する理由として使用できます。次に、アプリケーションとして、各データとアーキテクチャに応じて特定のトレーニングアルゴリズムを自動的に変更することにより、想定せずにデータアーキテクチャのアライメント条件を満たす新しいトレーニングフレームワークを導出します。標準のトレーニングアルゴリズムが与えられると、修正バージョンを実行するフレームワークは、畳み込み、接続のスキップ、MNIST、CIFARなどの標準ベンチマークデータセットを使用したバッチ正規化を使用して、ResNet-18にグローバルな収束保証を提供しながら、競争力のある（実用的な）テストパフォーマンスを維持することが経験的に示されます-10、CIFAR-100、Semeion、KMNIST、SVHN。

Representations of the world environment play a crucial role in machine intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a multilayer perceptron learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. As an application, we then derive a new training framework, which satisfies the data-architecture alignment condition without assuming it by automatically modifying any given training algorithm dependently on each data and architecture. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for ResNet-18 with convolutions, skip connections, and batch normalization with standard benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.

updated: Mon Jun 28 2021 16:31:30 GMT+0000 (UTC)

published: Mon Jun 28 2021 16:31:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト