Learning to Estimate Hidden Motions with Global Motion Aggregation

Shihao Jiang; Dylan Campbell; Yao Lu; Hongdong Li; Richard Hartley

グローバルモーションアグリゲーションを使用して隠れたモーションを推定する方法を学ぶ

オクルージョンは、ローカルの証拠に依存するオプティカルフローアルゴリズムに重大な課題をもたらします。遮蔽されたポイントは、最初のフレームでは画像化され、次のフレームでは画像化されないポイントと見なされます。フレーム外に移動するポイントも含まれるため、標準定義がわずかにオーバーロードされます。これらのポイントの動きを推定することは、特に2フレーム設定では非常に困難です。以前の作業では、CNNを使用してオクルージョンを学習しましたが、あまり成功していません。または、時間的な滑らかさを使用してオクルージョンについて推論するために複数のフレームが必要です。この論文では、画像の自己類似性をモデル化することにより、2フレームの場合にオクルージョンの問題をより適切に解決できると主張します。グローバルモーションアグリゲーションモジュールを紹介します。これは、最初の画像のピクセル間の長距離依存関係を見つけるためのトランスフォーマーベースのアプローチであり、対応するモーションフィーチャに対してグローバルアグリゲーションを実行します。閉塞領域でのオプティカルフローの推定値は、非閉塞領域でのパフォーマンスを損なうことなく大幅に改善できることを示しています。このアプローチにより、挑戦的なSintelデータセットで新しい最先端の結果が得られ、平均エンドポイントエラーがSintel Finalで13.6％、Sintel Cleanで13.7％向上します。提出の時点で、私たちの方法は、すべての公開済みおよび未公開のアプローチの中で、これらのベンチマークで最初にランク付けされています。コードはhttps://github.com/zacjiang/GMAで入手できます。

Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences. We consider an occluded point to be one that is imaged in the first frame but not in the next, a slight overloading of the standard definition since it also includes points that move out-of-frame. Estimating the motion of these points is extremely difficult, particularly in the two-frame setting. Previous work relies on CNNs to learn occlusions, without much success, or requires multiple frames to reason about occlusions using temporal smoothness. In this paper, we argue that the occlusion problem can be better solved in the two-frame case by modelling image self-similarities. We introduce a global motion aggregation module, a transformer-based approach to find long-range dependencies between pixels in the first image, and perform global aggregation on the corresponding motion features. We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions. This approach obtains new state-of-the-art results on the challenging Sintel dataset, improving the average end-point error by 13.6% on Sintel Final and 13.7% on Sintel Clean. At the time of submission, our method ranks first on these benchmarks among all published and unpublished approaches. Code is available at https://github.com/zacjiang/GMA

updated: Thu Jul 29 2021 20:59:31 GMT+0000 (UTC)

published: Tue Apr 06 2021 10:32:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト