Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

Jierun Chen; Shiu-hong Kao; Hao He; Weipeng Zhuo; Song Wen; Chul-Ho Lee; S. -H. Gary Chan

走れ、歩かない: ニューラルネットワークを高速化するための FLOPS の向上

高速ニューラルネットワークを設計するために、多くの研究では浮動小数点演算 (FLOP) の数を減らすことに重点が置かれてきました。ただし、このような FLOP の削減は、必ずしも同様のレベルのレイテンシの削減につながるとは限りません。これは主に、1 秒あたりの浮動小数点演算 (FLOPS) が非効率的であることが原因です。より高速なネットワークを実現するために、一般的な演算子を再検討し、そのような低い FLOPS は主に演算子の頻繁なメモリアクセス、特に深さ方向の畳み込みによるものであることを示します。したがって、冗長な計算とメモリアクセスを同時に削減することにより、空間的特徴をより効率的に抽出する新しい部分畳み込み (PConv) を提案します。私たちの PConv に基づいて、ニューラルネットワークの新しいファミリである FasterNet をさらに提案します。これは、さまざまな視覚タスクの精度を損なうことなく、さまざまなデバイスで他のものよりも大幅に高速な実行速度を実現します。たとえば、ImageNet-1k では、当社の小さな FasterNet-T0 は、GPU、CPU、および ARM プロセッサで MobileViT-XXS よりもそれぞれ 3.1 倍、3.1 倍、および 2.5 倍速く、2.9% より正確です。当社の大規模な FasterNet-L は、新興の Swin-B と同等の 83.5% という驚異的なトップ 1 精度を達成し、GPU での推論スループットが 49% 高く、CPU での計算時間を 42% 節約します。コードは https://github.com/JierunChen/FasterNet で入手できます。

To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-1k, our tiny FasterNet-T0 is 3.1×, 3.1×, and 2.5× faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being 2.9% more accurate. Our large FasterNet-L achieves impressive 83.5% top-1 accuracy, on par with the emerging Swin-B, while having 49% higher inference throughput on GPU, as well as saving 42% compute time on CPU. Code is available at https://github.com/JierunChen/FasterNet.

updated: Tue Mar 07 2023 06:05:30 GMT+0000 (UTC)

published: Tue Mar 07 2023 06:05:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト