We study a class of deep neural networks with networks that form a directed acyclic graph (DAG). For backpropagation defined by gradient descent with adaptive momentum, we show weights converge for a large class of nonlinear activation functions. The proof generalizes the results of Wu et al. (2008) who showed convergence for a feed forward network with one hidden layer. For an example of the effectiveness of DAG architectures, we describe an example of compression through an autoencoder, and compare against sequential feed forward networks under several metrics.
updated: Sun Jan 19 2020 04:59:25 GMT+0000 (UTC)
published: Sun May 21 2017 06:50:49 GMT+0000 (UTC)