RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Wei Niu; Mengshu Sun; Zhengang Li; Jou-An Chen; Jiexiong Guan; Xipeng Shen; Yanzhi Wang; Sijia Liu; Xue Lin; Bin Ren

RT3D：モバイルデバイスでの3D畳み込みニューラルネットワークのリアルタイム実行の実現

モバイルデバイスは、強力なハイエンドモバイルCPUとGPUを搭載しているため、ディープラーニングタスクの重要なキャリアになりつつあります。ただし、高い推論精度に加えて、リアルタイムパフォーマンスを対象とした3D畳み込みニューラルネットワーク（CNN）を実行することは依然として困難な作業です。その理由は、モデル構造がより複雑であり、モデルの次元が高いと、モバイルデバイスで利用可能な計算/ストレージリソースが圧倒されるためです。自然な方法は、ディープラーニングのウェイト剪定技術に目を向けることかもしれません。ただし、既存の2DCNN重みプルーニング手法を3DCNNに直接一般化することは、高い推論精度を達成しながらモバイル並列処理を十分に活用するには理想的ではありません。このホワイトペーパーでは、3D CNNのモデル圧縮およびモバイルアクセラレーションフレームワークであるRT3Dを提案し、ニューラルネットワークの重みプルーニングとコンパイラコード生成技術をシームレスに統合します。 2つの構造化スパース性スキーム、つまり、モバイルアクセラレーションに適したバニラ構造化スパース性とカーネルグループ構造化（KGS）スパース性を提案して調査します。バニラスパース性はカーネルグループ全体を削除しますが、KGSスパース性はよりきめ細かい構造化スパース性であり、デバイス上の完全な並列処理を活用しながら、より高い柔軟性を享受します。提案されたスパース性スキームを達成するために、再重み付けされた正則化プルーニングアルゴリズムを提案します。スパース性による推論時間の高速化は、モデルFLOP（浮動小数点演算）全体の剪定率に近づいています。 RT3Dは、3D CNNをサポートする現在のモバイルフレームワークと比較して、エンドツーエンドの推論時間で最大29.1倍の速度向上を示し、精度は1％〜1.5％低下します。携帯電話で代表的なC3DおよびR（2 + 1）Dモデルを実行する場合、16ビデオフレームのエンドツーエンドの推論時間は150ミリ秒以内である可能性があります。初めて、3DCNNのリアルタイム実行が既製のモバイルで実現されました。

Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i.e., the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29.1× speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

updated: Sun Jan 03 2021 18:03:16 GMT+0000 (UTC)

published: Mon Jul 20 2020 02:05:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト