DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Xuan Shen; Yaohua Wang; Ming Lin; Yilun Huang; Hao Tang; Xiuyu Sun; Yanzhi Wang

DeepMAD: ディープ畳み込みニューラルネットワークの数学的アーキテクチャ設計

ビジョントランスフォーマー (ViT) の急速な進歩は、従来の CNN ベースのモデルに影を落とし、さまざまなビジョンタスクにおける最先端のパフォーマンスを更新します。これは、純粋な CNN モデルが慎重に調整された場合に ViT モデルと同じくらい優れたパフォーマンスを達成できることを示す、CNN の世界での最近のいくつかの印象的な研究に火をつけます。このような高性能 CNN モデルの設計は、ネットワーク設計に関する重要な事前知識を必要とする、困難な作業です。この目的のために、Deep CNN (DeepMAD) の数学的アーキテクチャ設計と呼ばれる新しいフレームワークが提案され、原理に基づいた方法で高性能 CNN モデルを設計します。 DeepMAD では、CNN ネットワークは情報処理システムとしてモデル化され、その表現力と有効性は構造パラメーターによって分析的に定式化できます。次に、制約付き数理計画法 (MP) 問題を提案して、これらの構造パラメーターを最適化します。 MP 問題は、メモリフットプリントが小さい CPU で市販の MP ソルバーを使用して簡単に解決できます。さらに、DeepMAD は純粋な数学的フレームワークです。ネットワーク設計時に GPU やトレーニングデータは必要ありません。 DeepMAD の優位性は、複数の大規模なコンピュータービジョンベンチマークデータセットで検証されています。特に ImageNet-1k では、従来の畳み込み層のみを使用して、DeepMAD は Tiny レベルで ConvNeXt および Swin よりも 0.7% および 1.5% 高いトップ 1 精度を達成し、Small レベルで 0.8% および 0.9% 高い。

The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN models can achieve as good performance as ViT models when carefully tuned. While encouraging, designing such high-performance CNN models is challenging, requiring non-trivial prior knowledge of network design. To this end, a novel framework termed Mathematical Architecture Design for Deep CNN (DeepMAD) is proposed to design high-performance CNN models in a principled way. In DeepMAD, a CNN network is modeled as an information processing system whose expressiveness and effectiveness can be analytically formulated by their structural parameters. Then a constrained mathematical programming (MP) problem is proposed to optimize these structural parameters. The MP problem can be easily solved by off-the-shelf MP solvers on CPUs with a small memory footprint. In addition, DeepMAD is a pure mathematical framework: no GPU or training data is required during network design. The superiority of DeepMAD is validated on multiple large-scale computer vision benchmark datasets. Notably on ImageNet-1k, only using conventional convolutional layers, DeepMAD achieves 0.7% and 1.5% higher top-1 accuracy than ConvNeXt and Swin on Tiny level, and 0.8% and 0.9% higher on Small level.

updated: Sun Mar 05 2023 21:31:49 GMT+0000 (UTC)

published: Sun Mar 05 2023 21:31:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト