Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Yang Liu; Muzhi Zhu; Hengtao Li; Hao Chen; Xinlong Wang; Chunhua Shen

Matcher: 汎用特徴マッチングを使用して、あらゆるものをワンショットでセグメント化

大規模な事前トレーニングを活用したビジョン基盤モデルは、オープンワールドの画像理解において大きな可能性を示します。個々のモデルの機能は限られていますが、複数のモデルを適切に組み合わせることでプラスの相乗効果が生まれ、潜在能力を最大限に引き出すことができます。この研究では、汎用特徴抽出モデルとクラスに依存しないセグメンテーションモデルを統合することにより、あらゆるものをワンショットでセグメント化する Matcher を紹介します。単純にモデルを接続すると、満足のいくパフォーマンスが得られません。たとえば、モデルは一致する外れ値や偽陽性のマスクフラグメントを生成する傾向があります。これらの問題に対処するために、正確な画像間セマンティック密マッチングのための双方向マッチング戦略と、マスク提案生成のための堅牢なプロンプトサンプラーを設計します。さらに、制御可能なマスクのマージのための新しいインスタンスレベルのマッチング戦略を提案します。提案された Matcher メソッドは、すべてトレーニングなしで、さまざまなセグメンテーションタスクにわたって優れた汎化パフォーマンスを提供します。たとえば、ワンショットセマンティックセグメンテーションでは COCO-20^i で 52.7% の mIoU を達成し、最先端のスペシャリストモデルを 1.6% 上回っています。さらに、私たちの視覚化の結果は、オープンワールドの一般性と実際の画像の柔軟性を示しています。コードは https://github.com/aim-uofa/Matcher で公開されます。

Powered by large-scale pre-training, vision foundation models exhibit significant potential in open-world image understanding. Even though individual models have limited capabilities, combining multiple such models properly can lead to positive synergies and unleash their full potential. In this work, we present Matcher, which segments anything with one shot by integrating an all-purpose feature extraction model and a class-agnostic segmentation model. Naively connecting the models results in unsatisfying performance, e.g., the models tend to generate matching outliers and false-positive mask fragments. To address these issues, we design a bidirectional matching strategy for accurate cross-image semantic dense matching and a robust prompt sampler for mask proposal generation. In addition, we propose a novel instance-level matching strategy for controllable mask merging. The proposed Matcher method delivers impressive generalization performance across various segmentation tasks, all without training. For example, it achieves 52.7% mIoU on COCO-20^i for one-shot semantic segmentation, surpassing the state-of-the-art specialist model by 1.6%. In addition, our visualization results show open-world generality and flexibility on images in the wild. The code shall be released at https://github.com/aim-uofa/Matcher.

updated: Mon May 22 2023 17:59:43 GMT+0000 (UTC)

published: Mon May 22 2023 17:59:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト