EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation

Han Cai; Junyan Li; Muyan Hu; Chuang Gan; Song Han

EfficientViT: オンデバイスセマンティックセグメンテーションのための軽量マルチスケールアテンション

セマンティックセグメンテーションは、計算写真、自動運転など、多くの魅力的な現実世界のアプリケーションを可能にします。ただし、莫大な計算コストにより、ハードウェアリソースが限られているエッジデバイスに最先端のセマンティックセグメンテーションモデルを展開することは困難です。この作業は、デバイス上のセマンティックセグメンテーションのための新しい軽量マルチスケールアテンションを備えたセマンティックセグメンテーションモデルの新しいファミリーである EfficientViT を提示します。重い自己注意、ハードウェアの非効率な大規模カーネル畳み込み、または複雑なトポロジ構造に依存して良好なパフォーマンスを得る以前のセマンティックセグメンテーションモデルとは異なり、軽量のマルチスケールアテンションは、グローバルな受容野とマルチスケール学習 (2 つの重要な機能) を実現します。セマンティックセグメンテーションモデルの場合) 軽量でハードウェア効率の高い操作のみを使用します。そのため、EfficientViT は、モバイルプラットフォームで大幅な高速化を実現し、一般的なベンチマークデータセット全体で、以前の最先端のセマンティックセグメンテーションモデルよりも大幅なパフォーマンスの向上を実現します。 Cityscapes でパフォーマンスを低下させることなく、当社の EfficientViT は、SegFormer と SegNeXt に比べてそれぞれ最大 15 倍と 9.3 倍のモバイル遅延削減を提供します。同じモバイルレイテンシを維持しながら、EfficientViT は SegNeXt よりも ADE20K で +7.4 mIoU のゲインを提供します。コード: https://github.com/mit-han-lab/effectivevit.

Semantic segmentation enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art semantic segmentation models on edge devices with limited hardware resources difficult. This work presents EfficientViT, a new family of semantic segmentation models with a novel lightweight multi-scale attention for on-device semantic segmentation. Unlike prior semantic segmentation models that rely on heavy self-attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our lightweight multi-scale attention achieves a global receptive field and multi-scale learning (two critical features for semantic segmentation models) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art semantic segmentation models across popular benchmark datasets with significant speedup on the mobile platform. Without performance loss on Cityscapes, our EfficientViT provides up to 15x and 9.3x mobile latency reduction over SegFormer and SegNeXt, respectively. Maintaining the same mobile latency, EfficientViT provides +7.4 mIoU gain on ADE20K over SegNeXt. Code: https://github.com/mit-han-lab/efficientvit.

updated: Thu Apr 06 2023 01:19:23 GMT+0000 (UTC)

published: Sun May 29 2022 20:07:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト