Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Ashutosh Agarwal; Chetan Arora

Depthformer：ローカルグローバル情報融合による単眼深度推定のためのマルチスケールビジョントランスフォーマー

トランスフォーマーなどの注意ベースのモデルは、画像内の長距離依存関係をキャプチャできるため、セマンティックセグメンテーションなどの高密度予測タスクで優れたパフォーマンスを示しています。ただし、単眼深度予測のためのトランスの利点は、これまでほとんど調査されていません。このホワイトペーパーでは、屋内のNYUV2データセットと屋外のKITTIデータセットで、深度推定タスクのさまざまなトランスベースのモデルのベンチマークを行います。提案されたデコーダーネットワークによって効果的に組み合わされるマルチスケール特徴マップを生成するためにマルチヘッド自己注意を使用する単眼深度推定のための新しい注意ベースのアーキテクチャであるDepthformerを提案します。また、深度範囲をビンに分割するTransbinsモジュールを提案します。このビンの中心値は、画像ごとに適応的に推定されます。推定される最終的な深度は、各ピクセルのビン中心の線形結合です。 Transbinsモジュールは、エンコード段階でトランスフォーマーモジュールを使用してグローバル受容野を利用します。 NYUV2およびKITTI深度推定ベンチマークの実験結果は、提案された方法が、二乗平均平方根誤差（RMSE）に関して、最先端技術をそれぞれ3.3％および3.3％改善することを示しています。

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation that uses multi-head self-attention to produce the multiscale feature maps, which are effectively combined by our proposed decoder network. We also propose a Transbins module that divides the depth range into bins whose center value is estimated adaptively per image. The final depth estimated is a linear combination of bin centers for each pixel. Transbins module takes advantage of the global receptive field using the transformer module in the encoding stage. Experimental results on NYUV2 and KITTI depth estimation benchmark demonstrate that our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE).

updated: Sun Jul 10 2022 20:49:11 GMT+0000 (UTC)

published: Sun Jul 10 2022 20:49:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト