Panoptic SwiftNet: Pyramidal Fusion for Real-time Panoptic Segmentation

Josip Šarić; Marin Oršić; Siniša Šegvić

Panoptic SwiftNet：リアルタイムのパノプティコンセグメンテーションのためのピラミッド型融合

高密度のパノラマ予測は、自動運転、自動倉庫、農業用ロボットなど、多くの既存のアプリケーションの重要な要素です。ただし、これらのアプリケーションのほとんどは、視覚的な閉ループ制御への入力として、復元された高密度セマンティクスを活用します。したがって、実際の展開では、組み込みハードウェア上の大きな入力解像度に対してリアルタイムの推論が必要です。これらの要件は、限られた計算リソースで高精度を実現する計算効率の高いアプローチを必要とします。マルチスケール特徴抽出のバックボーン容量をトレードオフすることにより、この目標を達成することを提案します。パノプティコンセグメンテーションへの同時アプローチと比較して、私たちの方法の主な目新しさは、ピラミッド融合によるスケール同変特徴抽出とクロススケールアップサンプリングです。私たちの最高のモデルは、フル解像度の2MPx画像とFP16 TensorRT最適化を使用したRTX3090で60FPSのCityscapesvalで55.9％のPQを達成します。

Dense panoptic prediction is a key ingredient in many existing applications such as autonomous driving, automated warehouses or agri-robotics. However, most of these applications leverage the recovered dense semantics as an input to visual closed-loop control. Hence, practical deployments require real-time inference over large input resolutions on embedded hardware. These requirements call for computationally efficient approaches which deliver high accuracy with limited computational resources. We propose to achieve this goal by trading-off backbone capacity for multi-scale feature extraction. In comparison with contemporaneous approaches to panoptic segmentation, the main novelties of our method are scale-equivariant feature extraction and cross-scale upsampling through pyramidal fusion. Our best model achieves 55.9% PQ on Cityscapes val at 60 FPS on full resolution 2MPx images and RTX3090 with FP16 Tensor RT optimization.

updated: Tue Mar 15 2022 13:47:40 GMT+0000 (UTC)

published: Tue Mar 15 2022 13:47:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト