A Bayesian Perspective of Convolutional Neural Networks through a Deconvolutional Generative Model

Tan Nguyen; Nhat Ho; Ankit Patel; Anima Anandkumar; Michael I. Jordan; Richard G. Baraniuk

デコンボリューショナル生成モデルによる畳み込みニューラルネットワークのベイズの視点

画像の教師付き予測のための畳み込みニューラルネットワーク（CNN）の成功に触発されて、デコンボリューショナル生成モデル（DGM）を設計します。これは、推論計算が特定のCNNアーキテクチャのものに対応する新しい確率的生成モデルです。 DGMはCNNを使用して、確率モデルの事前分布を設計します。さらに、DGMは、粗いスケールから細かいスケールまでの画像を生成します。各スケールで潜在変数の小さなセットを導入し、共役事前分布を介してすべての潜在変数間の依存関係を強制します。この共役事前分布により、CNNのトレーニング用の生成モデルでレンダリングされたパス（レンダリングパスの正規化（RPN））に基づいた新しいレギュライザーが生成されます。この正則化は、理論と実際の両方で一般化を改善することを実証します。さらに、DGMの尤度推定によりCNNのトレーニング損失が発生し、これに触発されて、オブジェクト分類の従来のクロスエントロピー損失よりも優れたMax-Minクロスエントロピーと呼ばれる新しい損失を設計します。 Max-Minクロスエントロピーは、新しいディープネットワークアーキテクチャ、つまりMax-Minネットワークを示唆しています。これは、良好な予測パフォーマンスを維持しながら、ラベルの少ないデータから学習できます。私たちの実験は、RPNおよびMax-Minアーキテクチャを備えたDGMが、半教師ありおよび教師あり学習タスク用のSVHN、CIFAR10、およびCIFAR100を含むベンチマークの最先端を上回ることを示しています。

Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Deconvolutional Generative Model (DGM), a new probabilistic generative model whose inference calculations correspond to those in a given CNN architecture. The DGM uses a CNN to design the prior distribution in the probabilistic model. Furthermore, the DGM generates images from coarse to finer scales. It introduces a small set of latent variables at each scale, and enforces dependencies among all the latent variables via a conjugate prior distribution. This conjugate prior yields a new regularizer based on paths rendered in the generative model for training CNNs-the Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization, both in theory and in practice. In addition, likelihood estimation in the DGM yields training losses for CNNs, and inspired by this, we design a new loss termed as the Max-Min cross entropy which outperforms the traditional cross-entropy loss for object classification. The Max-Min cross entropy suggests a new deep network architecture, namely the Max-Min network, which can learn from less labeled data while maintaining good prediction performance. Our experiments demonstrate that the DGM with the RPN and the Max-Min architecture exceeds or matches the-state-of-art on benchmarks including SVHN, CIFAR10, and CIFAR100 for semi-supervised and supervised learning tasks.

updated: Mon Dec 09 2019 10:21:21 GMT+0000 (UTC)

published: Thu Nov 01 2018 01:27:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト