Depth Estimation vs Classification as Pre-training for Semantic Segmentation

Dong Lao; Alex Wong; Stefano Soatto

セマンティックセグメンテーションの事前トレーニングとしての深度推定と分類

セマンティックセグメンテーション用のディープニューラルネットワークのトレーニングは労力がかかるため、データが豊富な別のタスク (通常は画像レベルの分類) で事前トレーニングしてから、小さな注釈付きデータセットで微調整するのが一般的です。トレーニング中に深度情報を組み込むとセマンティックセグメンテーションが改善される可能性があることを示す経験的証拠がありますが、この効果の程度はまだ完全に特徴付けられていません。この論文では、単眼深度推定がセマンティックセグメンテーションの事前トレーニングとして機能し、理想的には手動で監視された事前トレーニングの必要性を排除できるかどうかを研究します。 KITTI、Cityscapes、NYU-V2 などの一般的なベンチマークを使用して、深度推定と分類を使用して事前トレーニングを評価し、下流のセマンティックセグメンテーションへの影響を測定します。前者は 5.8% の mIoU と 5.2% のピクセル精度で後者を上回っています。深さの推定、トレーニングパイプライン、およびセマンティックの微調整に対するデータ解決のためのさまざまな形式の監視の影響を分析します。さらに、同じ損失、つまり測光再投影誤差を共有しているにもかかわらず、オプティカルフローを含む他の形式の自己監視は深度事前トレーニングよりも効果的ではないことがわかりました。

Training a deep neural network for semantic segmentation is labor intensive, so it is common to pre-train on a different task for which data is abundant, typically image-level classification, and then fine-tune with a small annotated dataset. There is empirical evidence showing that incorporating depth information during training may improve semantic segmentation, but the extent of this effect has yet to be fully characterized. In this paper, we study whether monocular depth estimation can serve as pre-training for semantic segmentation, ideally eliminating the need for manually supervised pre-training. Using common benchmarks such as KITTI, Cityscapes, and NYU-V2, we evaluate pre-training using depth estimation vs. classification, measuring their effects on downstream semantic segmentation. The former edges out the latter by 5.8% mIoU and 5.2% pixel accuracy. We analyze the impact of different forms of supervision for depth estimation, training pipelines, and data resolution on semantic fine-tuning. Additionally, we find that other forms of self-supervision are less effective than depth pre-training, including optical flow, despite sharing the same loss, namely the photometric reprojection error.

updated: Fri Mar 17 2023 02:07:10 GMT+0000 (UTC)

published: Sat Mar 26 2022 04:27:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト