Dyna-DepthFormer: Multi-frame Transformer for Self-Supervised Depth Estimation in Dynamic Scenes

Songchun Zhang; Chunhui Zhao

Dyna-DepthFormer: 動的シーンでの自己教師あり深度推定のためのマルチフレームトランスフォーマー

自己教師ありメソッドは、深度推定タスクで有望な結果を示しています。ただし、以前の方法では、マルチフレーム相関情報を十分に活用せず、動的オブジェクトの動きを無視して、ターゲット深度マップとカメラのエゴモーションを同時に推定していました。この論文では、シーン深度と3Dモーションフィールドを共同で予測し、トランスフォーマーでマルチフレーム情報を集約する、新しいDyna-Depthformerフレームワークを提案します。私たちの貢献は 2 つあります。まず、強化された深度特徴表現を取得するために、一連の自己注意層と相互注意層を介してマルチビュー相関を活用します。具体的には、透視変換を使用して初期基準点を取得し、変形可能な注意を使用して計算コストを削減します。次に、セマンティックプライアを使用せずに動的オブジェクトのモーションフィールドを推定するワーピングベースのモーションネットワークを提案します。モーションフィールドの予測を改善するために、スパース正則化損失と共に反復最適化戦略を提案します。パイプライン全体で、最小の再投影損失を構築することにより、エンドツーエンドの自己監視トレーニングを実現します。 KITTI および Cityscapes ベンチマークでの広範な実験により、この方法の有効性が実証され、この方法が最先端のアルゴリズムよりも優れていることが示されました。

Self-supervised methods have showed promising results on depth estimation task. However, previous methods estimate the target depth map and camera ego-motion simultaneously, underusing multi-frame correlation information and ignoring the motion of dynamic objects. In this paper, we propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly and aggregates multi-frame information with transformer. Our contributions are two-fold. First, we leverage multi-view correlation through a series of self- and cross-attention layers in order to obtain enhanced depth feature representation. Specifically, we use the perspective transformation to acquire the initial reference point, and use deformable attention to reduce the computational cost. Second, we propose a warping-based Motion Network to estimate the motion field of dynamic objects without using semantic prior. To improve the motion field predictions, we propose an iterative optimization strategy, together with a sparsity-regularized loss. The entire pipeline achieves end-to-end self-supervised training by constructing a minimum reprojection loss. Extensive experiments on the KITTI and Cityscapes benchmarks demonstrate the effectiveness of our method and show that our method outperforms state-of-the-art algorithms.

updated: Sat Jan 14 2023 09:43:23 GMT+0000 (UTC)

published: Sat Jan 14 2023 09:43:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト