MonoDVPS: A Self-Supervised Monocular Depth Estimation Approach to Depth-aware Video Panoptic Segmentation

Andra Petrovai; Sergiu Nedevschi

MonoDVPS: 深度認識ビデオパノプティックセグメンテーションへの自己監視型単眼深度推定アプローチ

深度認識ビデオパノプティックセグメンテーションは、ビデオシーケンスからパノプティック 3D ポイントクラウドを復元する逆投影問題に取り組みます。3D ポイントはセマンティッククラスと時間的に一貫したインスタンス識別子で拡張されます。単眼深度推定とビデオパノプティックセグメンテーションを実行するマルチタスクネットワークを使用した新しいソリューションを提案します。深度と画像セグメンテーションの両方でグラウンドトゥルースラベルを取得するには比較的大きなコストがかかるため、ラベルのないビデオシーケンスの力を活用して、自己教師あり単眼深度推定と、ビデオパノプティックセグメンテーション用の疑似ラベルからの半教師あり学習を利用します。深度予測をさらに改善するために、トレーニング信号の破損を回避するために、パノプティックガイドによる深度損失と移動オブジェクト用の新しいパノプティックマスキングスキームを導入します。 Cityscapes-DVPS および SemKITTI-DVPS データセットに関する広範な実験は、提案された改善を備えたモデルが競争力のある結果と高速な推論速度を達成することを示しています。

Depth-aware video panoptic segmentation tackles the inverse projection problem of restoring panoptic 3D point clouds from video sequences, where the 3D points are augmented with semantic classes and temporally consistent instance identifiers. We propose a novel solution with a multi-task network that performs monocular depth estimation and video panoptic segmentation. Since acquiring ground truth labels for both depth and image segmentation has a relatively large cost, we leverage the power of unlabeled video sequences with self-supervised monocular depth estimation and semi-supervised learning from pseudo-labels for video panoptic segmentation. To further improve the depth prediction, we introduce panoptic-guided depth losses and a novel panoptic masking scheme for moving objects to avoid corrupting the training signal. Extensive experiments on the Cityscapes-DVPS and SemKITTI-DVPS datasets demonstrate that our model with the proposed improvements achieves competitive results and fast inference speed.

updated: Fri Oct 14 2022 07:00:42 GMT+0000 (UTC)

published: Fri Oct 14 2022 07:00:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト