CATs: Cost Aggregation Transformers for Visual Correspondence

Seokju Cho; Sunghwan Hong; Sangryul Jeon; Yunsung Lee; Kwanghoon Sohn; Seungryong Kim

CAT：視覚的対応のためのコスト集計トランスフォーマー

コスト集約トランスフォーマー（CAT）と呼ばれる新しいコスト集約ネットワークを提案し、クラス内の大きな外観と幾何学的な変化によってもたらされる追加の課題を伴う、意味的に類似した画像間の密な対応を見つけます。コスト集計は、マッチングタスクにおいて非常に重要なプロセスであり、マッチングの精度はその出力の品質に依存します。コスト集約に対処する手作りまたはCNNベースの方法と比較して、深刻な変形に対するロバスト性に欠けるか、受容野が限られているために誤った一致を識別できないCNNの制限を継承するという点で、CATは初期相関マップ間のグローバルコンセンサスを調査します。自己注意メカニズムを完全に活用できるようにするいくつかのアーキテクチャ設計の助け。具体的には、ノイズの多い初期相関マップを明確にするためにコスト集計プロセスを支援する外観アフィニティモデリングを含め、階層的特徴表現からさまざまなセマンティクスを効率的にキャプチャするためのマルチレベル集計を提案します。次に、スワッピングの自己注意手法と残りの接続を組み合わせて、一貫したマッチングを実施するだけでなく、学習プロセスを容易にします。これにより、パフォーマンスが明らかに向上することがわかります。最新の方法に対する提案されたモデルの有効性を実証するための実験を実施し、広範なアブレーション研究を提供します。コードとトレーニング済みモデルは、〜https：//github.com/SunghwanHong/CATsで入手できます。

We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to hand-crafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to fully leverage self-attention mechanism. Specifically, we include appearance affinity modeling to aid the cost aggregation process in order to disambiguate the noisy initial correlation maps and propose multi-level aggregation to efficiently capture different semantics from hierarchical feature representations. We then combine with swapping self-attention technique and residual connections not only to enforce consistent matching but also to ease the learning process, which we find that these result in an apparent performance boost. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies. Code and trained models are available at~https://github.com/SunghwanHong/CATs.

updated: Tue Oct 26 2021 12:22:41 GMT+0000 (UTC)

published: Fri Jun 04 2021 14:39:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト