AbHE: All Attention-based Homography Estimation

Mingxiao Huo; Zhihao Zhang; Xianqiang Yang

AbHE: すべての注意ベースのホモグラフィ推定

ホモグラフィ推定は基本的なコンピュータービジョンタスクであり、画像の位置合わせのために多視点画像から変換を取得することを目的としています。教師なし学習ホモグラフィ推定は、特徴抽出と変換行列回帰のために畳み込みニューラルネットワークをトレーニングします。最先端のホモグラフィ法は畳み込みニューラルネットワークに基づいていますが、高レベルの視覚タスクで優位性を示す変換器に焦点を当てた研究はほとんどありません。この論文では、Swin Transformer に基づく強力なベースラインモデルを提案します。これは、ローカル機能の畳み込みニューラルネットワークとグローバル機能の変換モジュールを組み合わせたものです。さらに、特徴マップ内の一致する特徴を粗く検索するために、クロス非ローカル層が導入されます。ホモグラフィ回帰段階では、相関ボリュームのチャネルに注意層を採用します。これにより、いくつかの弱い相関特徴点が除外される可能性があります。実験は、8 自由度 (DOF) ホモグラフィ推定で、私たちの方法が最先端の方法より優れていることを示しています。

Homography estimation is a basic computer vision task, which aims to obtain the transformation from multi-view images for image alignment. Unsupervised learning homography estimation trains a convolution neural network for feature extraction and transformation matrix regression. While the state-of-the-art homography method is based on convolution neural networks, few work focuses on transformer which shows superiority in high-level vision tasks. In this paper, we propose a strong-baseline model based on the Swin Transformer, which combines convolution neural network for local features and transformer module for global features. Moreover, a cross non-local layer is introduced to search the matched features within the feature maps coarsely.In the homography regression stage, we adopts an attention layer for the channels of correlation volume, which can drop out some weak correlation feature points. The experiment shows that in 8 Degree-of-Freedoms(DOFs) homography estimation our methods overperform the state-of-the-art method.

updated: Tue Dec 06 2022 15:00:00 GMT+0000 (UTC)

published: Tue Dec 06 2022 15:00:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト