Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

Jie Xiang; Yun Wang; Lifeng An; Haiyang Liu; Jian Liu

自己管理型の単一フレームと複数フレームの深度推定の間の相互影響の調査

自己教師ありシングルフレームとマルチフレームの両方の深度推定方法は、トレーニングにラベルのない単眼ビデオのみを必要としますが、シングルフレーム方法は主に外観ベースの機能に依存しているのに対し、マルチフレーム方法は幾何学的手がかりに焦点を当てているため、それらが活用する情報は異なります。シングルフレーム方式とマルチフレーム方式の補完的な情報を考慮して、シングルフレーム深度を活用してマルチフレーム深度を改善しようとする作品もあります。ただし、これらの方法では、シングルフレーム深度とマルチフレーム深度の違いを利用してマルチフレーム深度を改善することも、マルチフレーム深度を活用してシングルフレーム深度モデルを最適化することもできません。シングルフレーム方式とマルチフレーム方式の相互影響を十分に活用するために、新しい自己教師ありトレーニングフレームワークを提案します。具体的には、最初に、マルチフレームモデルをトレーニングするために、シングルフレーム深度によって導かれるピクセル単位の適応深度サンプリングモジュールを導入します。次に、最小再投影ベースの蒸留損失を活用して、知識をマルチフレーム深度ネットワークからシングルフレームネットワークに転送し、シングルフレーム深度を改善します。最後に、マルチフレーム深度推定のパフォーマンスをさらに向上させる前に、改善されたシングルフレーム深度を考慮します。 KITTI および Cityscapes データセットに関する実験結果は、私たちの方法が自己教師あり単眼環境での既存のアプローチよりも優れていることを示しています。

Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to leverage single-frame depth to improve multi-frame depth. However, these methods can neither exploit the difference between single-frame depth and multi-frame depth to improve multi-frame depth nor leverage multi-frame depth to optimize single-frame depth models. To fully utilize the mutual influence between single-frame and multi-frame methods, we propose a novel self-supervised training framework. Specifically, we first introduce a pixel-wise adaptive depth sampling module guided by single-frame depth to train the multi-frame model. Then, we leverage the minimum reprojection based distillation loss to transfer the knowledge from the multi-frame depth network to the single-frame network to improve single-frame depth. Finally, we regard the improved single-frame depth as a prior to further boost the performance of multi-frame depth estimation. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms existing approaches in the self-supervised monocular setting.

updated: Mon Aug 28 2023 02:23:05 GMT+0000 (UTC)

published: Tue Apr 25 2023 09:39:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト