Accurate and Efficient Stereo Matching via Attention Concatenation Volume

Gangwei Xu; Yun Wang; Junda Cheng; Jinhui Tang; Xin Yang

Attention Concatenation Volume による正確で効率的なステレオマッチング

ステレオマッチングは、多くのビジョンおよびロボティクスアプリケーションの基本的な構成要素です。有益で簡潔なコストボリューム表現は、高精度で効率的なステレオマッチングに不可欠です。この論文では、注意連結ボリューム（ACV）と名付けられた新しいコストボリューム構築方法を提示します。これは、相関手がかりから注意重みを生成して、冗長情報を抑制し、連結ボリューム内のマッチング関連情報を強化します。 ACV は、ほとんどのステレオマッチングネットワークにシームレスに組み込むことができます。結果として得られるネットワークは、より軽量なアグリゲーションネットワークを使用しながら、より高い精度を実現できます。さらに、Fast-ACV という名前のリアルタイムパフォーマンスを可能にする ACV の高速バージョンを設計します。これは、計算コストとメモリコストを大幅に削減し、満足のいく精度を維持しながら、低解像度の相関の手がかりから可能性の高い視差仮説と対応する注意の重みを生成します。 . Fast-ACV の核となるアイデアは、ボリュームアテンションプロパゲーション (VAP) です。これは、アップサンプリングされた相関ボリュームから正確な相関値を自動的に選択し、これらの正確な値をあいまいな相関の手がかりを持つ周囲のピクセルに伝播します。さらに、高精度のネットワーク ACVNet とリアルタイムネットワークの Fast-ACVNet を、それぞれ当社の ACV と Fast-ACV に基づいて設計し、いくつかのベンチマークで最先端のパフォーマンスを達成しています (つまり、当社の ACVNet は 2 位にランクされています)。 KITTI 2015 とシーンフロー、およびすべての公開されたメソッドの中で KITTI 2012 と ETH3D の 3 番目; 当社の Fast-ACVNet は、シーンフロー、KITTI 2012 および 2015 のほぼすべての最先端のリアルタイムメソッドよりも優れています。一般化能力の向上）

Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this paper, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. The ACV can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy. We further design a fast version of ACV to enable real-time performance, named Fast-ACV, which generates high likelihood disparity hypotheses and the corresponding attention weights from low-resolution correlation clues to significantly reduce computational and memory cost and meanwhile maintain a satisfactory accuracy. The core idea of our Fast-ACV is volume attention propagation (VAP) which can automatically select accurate correlation values from an upsampled correlation volume and propagate these accurate values to the surroundings pixels with ambiguous correlation clues. Furthermore, we design a highly accurate network ACVNet and a real-time network Fast-ACVNet based on our ACV and Fast-ACV respectively, which achieve the state-of-the-art performance on several benchmarks (i.e., our ACVNet ranks the 2nd on KITTI 2015 and Scene Flow, and the 3rd on KITTI 2012 and ETH3D among all the published methods; our Fast-ACVNet outperforms almost all state-of-the-art real-time methods on Scene Flow, KITTI 2012 and 2015 and meanwhile has better generalization ability)

updated: Mon Nov 20 2023 06:26:47 GMT+0000 (UTC)

published: Fri Sep 23 2022 08:14:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト