DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions

Yuke Wang; Boyuan Feng; Yufei Ding

DSXplore：スライディングチャネル畳み込みによる畳み込みニューラルネットワークの最適化

畳み込みニューラルネットワーク（CNN）の重要な進歩として、深さ方向に分離可能な畳み込み（DSC）は、モデルの精度を維持しながらCNNの計算とパラメーターサイズを削減するための最も一般的な手法の1つになりつつあります。また、一般に計算能力とメモリが不足しているモバイルデバイスなど、幅広いアプリケーションへの計算およびメモリを大量に消費するCNNの適用性を改善するための大きな影響をもたらします。ただし、DSCのこれまでの研究は、限られた既存のDSC設計の合成に主に焦点を当てているため、より高い精度とより高い計算/パラメーター削減を実現できる、より多くの潜在的な設計を探索する機会を逃しています。さらに、既製の畳み込み実装は限られた計算スキームを提供するため、異なる畳み込みパターンを持つDSCのサポートが不足しています。この目的のために、CNNでDSCを探索するための最初の最適化された設計であるDSXploreを紹介します。具体的には、アルゴリズムレベルで、DSXploreは新しい因数分解されたカーネル（スライディングチャネル畳み込み（SCC））を組み込んでおり、入力チャネルのオーバーラップを特徴として、精度のパフォーマンスと計算およびメモリコストの削減のバランスを取ります。 SCCは、調整可能なカーネルパラメータを導入することにより、設計探索のための膨大なスペースも提供します。さらに、実装レベルでは、入力中心の後方設計やチャネル循環最適化などのいくつかの主要な手法を活用して、SCCに合わせて最適化されたGPU実装を実行します。主流のCNN全体のさまざまなデータセットでの集中的な実験は、標準の畳み込みと既存のDSCに比べて、精度と計算/パラメーターの削減のバランスをとるDSXploreの利点を示しています。

As the key advancement of the convolutional neural networks (CNNs), depthwise separable convolutions (DSCs) are becoming one of the most popular techniques to reduce the computations and parameters size of CNNs meanwhile maintaining the model accuracy. It also brings profound impact to improve the applicability of the compute- and memory-intensive CNNs to a broad range of applications, such as mobile devices, which are generally short of computation power and memory. However, previous research in DSCs are largely focusing on compositing the limited existing DSC designs, thus, missing the opportunities to explore more potential designs that can achieve better accuracy and higher computation/parameter reduction. Besides, the off-the-shelf convolution implementations offer limited computing schemes, therefore, lacking support for DSCs with different convolution patterns. To this end, we introduce, DSXplore, the first optimized design for exploring DSCs on CNNs. Specifically, at the algorithm level, DSXplore incorporates a novel factorized kernel -- sliding-channel convolution (SCC), featured with input-channel overlapping to balance the accuracy performance and the reduction of computation and memory cost. SCC also offers enormous space for design exploration by introducing adjustable kernel parameters. Further, at the implementation level, we carry out an optimized GPU-implementation tailored for SCC by leveraging several key techniques, such as the input-centric backward design and the channel-cyclic optimization. Intensive experiments on different datasets across mainstream CNNs show the advantages of DSXplore in balancing accuracy and computation/parameter reduction over the standard convolution and the existing DSCs.

updated: Mon Jan 04 2021 02:59:10 GMT+0000 (UTC)

published: Mon Jan 04 2021 02:59:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト