Multi-View Stereo with Transformer

Jie Zhu; Bo Peng; Wanqing Li; Haifeng Shen; Zhe Zhang; Jianjun Lei

トランスフォーマー付きマルチビューステレオ

この論文では、マルチビューステレオ（MVS）用のMVSTRと呼ばれるネットワークを提案します。 Transformerに基づいて構築されており、MVSの信頼性の高いマッチングを実現するために不可欠な、グローバルコンテキストと3D整合性を備えた高密度の特徴を抽出できます。具体的には、既存のCNNベースのMVSメソッドの限られた受容野の問題に取り組むために、グローバルコンテキストTransformerモジュールが最初に提案され、ビュー内のグローバルコンテキストを探索します。さらに、高密度の機能をさらに3D整合性のあるものにするために、3Dジオメトリトランスモジュールは、ビュー間の情報の相互作用を容易にするために適切に設計されたクロスビューアテンションメカニズムで構築されています。実験結果は、提案されたMVSTRがDTUデータセットで最高の全体的なパフォーマンスを達成し、Tanks＆Templesベンチマークデータセットで強力な一般化を達成することを示しています。

This paper proposes a network, referred to as MVSTR, for Multi-View Stereo (MVS). It is built upon Transformer and is capable of extracting dense features with global context and 3D consistency, which are crucial to achieving reliable matching for MVS. Specifically, to tackle the problem of the limited receptive field of existing CNN-based MVS methods, a global-context Transformer module is first proposed to explore intra-view global context. In addition, to further enable dense features to be 3D-consistent, a 3D-geometry Transformer module is built with a well-designed cross-view attention mechanism to facilitate inter-view information interaction. Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and strong generalization on the Tanks & Temples benchmark dataset.

updated: Wed Dec 01 2021 08:06:59 GMT+0000 (UTC)

published: Wed Dec 01 2021 08:06:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト