Delay Differential Neural Networks

Srinivas Anumasa; P. K. Srijith

遅延微分ニューラルネットワーク

ニューラル常微分方程式（NODE）は、中間特徴ベクトルの計算を、ニューラルネットワークによってパラメーター化された常微分方程式の軌跡として扱います。この論文では、遅延微分方程式（DDE）に触発された新しいモデルである遅延微分ニューラルネットワーク（DDNN）を提案します。提案されたモデルは、隠れた特徴ベクトルの導関数を、現在の特徴ベクトルと過去の特徴ベクトル（履歴）の関数と見なします。この関数はニューラルネットワークとしてモデル化されているため、最近の多くのResNetバリアントの継続的な深さの代替につながります。現在および過去の特徴ベクトルの考慮方法に応じて、2つの異なるDDNNアーキテクチャを提案します。 DDNNをトレーニングするために、勾配を計算し、ネットワークを介して逆伝播するためのメモリ効率の高い隣接方法を提供します。 DDNNは、一般化のパフォーマンスに影響を与えることなくパラメーターの数をさらに減らすことにより、NODEのデータ効率を向上させます。 Cifar10やCifar100などの合成および実世界の画像分類データセットで実施された実験は、提案されたモデルの有効性を示しています。

Neural ordinary differential equations (NODEs) treat computation of intermediate feature vectors as trajectories of ordinary differential equation parameterized by a neural network. In this paper, we propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs). The proposed model considers the derivative of the hidden feature vector as a function of the current feature vector and past feature vectors (history). The function is modelled as a neural network and consequently, it leads to continuous depth alternatives to many recent ResNet variants. We propose two different DDNN architectures, depending on the way current and past feature vectors are considered. For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network. DDNN improves the data efficiency of NODE by further reducing the number of parameters without affecting the generalization performance. Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.

updated: Sat Dec 12 2020 12:20:54 GMT+0000 (UTC)

published: Sat Dec 12 2020 12:20:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト