SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model

Shili Zhou; Ruian He; Weimin Tan; Bo Yan

SAMFlow: セグメント何でもモデルを使用してオプティカルフローの断片化を排除

オプティカルフロー推定は、2 つのフレーム間の 2D 密な動きフィールドを見つけることを目的としています。モデル構造とトレーニングデータセットの制限により、既存の方法ではローカルな手がかりに依存しすぎてオブジェクトの完全性が無視されることが多く、その結果、動き推定が断片化してしまいます。理論分析を通じて、事前にトレーニングされた大型ビジョンモデルがオプティカルフロー推定に役立つことがわかり、最近有名になったセグメントエニシングモデル (SAM) が完全なオブジェクトをセグメント化する強力な能力を示しており、これが断片化問題の解決に適していることに気付きました。。そこで、我々は、凍結した SAM 画像エンコーダを FlowFormer に埋め込んでオブジェクトの知覚を強化するソリューションを提案します。オプティカルフロー推定などの非セグメンテーションタスクで SAM を徹底的に利用するという課題に対処するために、SAM エンコーダとオプティカルフローコンテキストエンコーダを融合するコンテキストフュージョンモジュールを含むオプティカルフロータスク固有の適応スキームを提案します。コンテキスト適応モジュールは、学習されたタスク固有の埋め込みを使用してオプティカルフロータスクに SAM 機能を適応させます。私たちが提案する SAMFlow モデルは、Sintel および KITTI-15 トレーニングセットで 0.86/2.10 クリーン/ファイナル EPE および 3.55/12.32 EPE/F1-all に達し、Flowformer を 8.5%/9.9% および 13.2%/16.3% 上回っています。さらに、当社のモデルは、Sintel および KITTI-15 ベンチマークで最先端のパフォーマンスを達成し、Sintel のクリーンパスにおけるすべての 2 フレーム方式の中で第 1 位にランクされています。

Optical Flow Estimation aims to find the 2D dense motion field between two frames. Due to the limitation of model structures and training datasets, existing methods often rely too much on local clues and ignore the integrity of objects, resulting in fragmented motion estimation. Through theoretical analysis, we find the pre-trained large vision models are helpful in optical flow estimation, and we notice that the recently famous Segment Anything Model (SAM) demonstrates a strong ability to segment complete objects, which is suitable for solving the fragmentation problem. We thus propose a solution to embed the frozen SAM image encoder into FlowFormer to enhance object perception. To address the challenge of in-depth utilizing SAM in non-segmentation tasks like optical flow estimation, we propose an Optical Flow Task-Specific Adaption scheme, including a Context Fusion Module to fuse the SAM encoder with the optical flow context encoder, and a Context Adaption Module to adapt the SAM features for optical flow task with Learned Task-Specific Embedding. Our proposed SAMFlow model reaches 0.86/2.10 clean/final EPE and 3.55/12.32 EPE/F1-all on Sintel and KITTI-15 training set, surpassing Flowformer by 8.5%/9.9% and 13.2%/16.3%. Furthermore, our model achieves state-of-the-art performance on the Sintel and KITTI-15 benchmarks, ranking #1 among all two-frame methods on Sintel clean pass.

updated: Thu Dec 21 2023 07:03:08 GMT+0000 (UTC)

published: Mon Jul 31 2023 11:40:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト