RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes

Tak-Wai Hui

RM-Depth: 動的シーンにおける反復単眼深度の教師なし学習

教師なしの方法は、単眼深度推定で有望な結果を示しています。ただし、トレーニングデータは、オブジェクトを動かさないシーンでキャプチャする必要があります。精度の限界を押し上げるために、最近の方法ではモデルパラメーターを増やす傾向があります。この論文では、単眼深度を共同で予測し、動く物体とカメラの動きを含む完全な 3D 動きをするために、教師なし学習フレームワークが提案されています。 (1) 反復変調ユニットを使用して、エンコーダとデコーダの機能を適応的かつ反復的に融合します。これにより、単一画像の深度推定が改善されるだけでなく、モデルパラメーターが浪費されることもありません。 (2) アップサンプリングに 1 セットのフィルターを使用する代わりに、残留アップサンプリング用に複数のフィルターセットが考案されます。これにより、エッジ保存フィルターの学習が容易になり、パフォーマンスが向上します。 (3)ワーピングベースのネットワークを使用して、セマンティックプライアを使用せずに移動オブジェクトのモーションフィールドを推定します。これにより、シーンの厳格さの要件が解消され、教師なし学習に一般的なビデオを使用できるようになります。モーションフィールドは、外れ値を認識したトレーニング損失によってさらに正則化されます。深度モデルは、テスト時間と 297 万のパラメーターで 1 つの画像を使用するだけですが、KITTI および Cityscapes ベンチマークで最先端の結果を達成しています。

Unsupervised methods have showed promising results on monocular depth estimation. However, the training data must be captured in scenes without moving objects. To push the envelope of accuracy, recent methods tend to increase their model parameters. In this paper, an unsupervised learning framework is proposed to jointly predict monocular depth and complete 3D motion including the motions of moving objects and camera. (1) Recurrent modulation units are used to adaptively and iteratively fuse encoder and decoder features. This not only improves the single-image depth inference but also does not overspend model parameters. (2) Instead of using a single set of filters for upsampling, multiple sets of filters are devised for the residual upsampling. This facilitates the learning of edge-preserving filters and leads to the improved performance. (3) A warping-based network is used to estimate a motion field of moving objects without using semantic priors. This breaks down the requirement of scene rigidity and allows to use general videos for the unsupervised learning. The motion field is further regularized by an outlier-aware training loss. Despite the depth model just uses a single image in test time and 2.97M parameters, it achieves state-of-the-art results on the KITTI and Cityscapes benchmarks.

updated: Wed Mar 08 2023 09:11:50 GMT+0000 (UTC)

published: Wed Mar 08 2023 09:11:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト