Cost Aggregation Is All You Need for Few-Shot Segmentation

Sunghwan Hong; Seokju Cho; Jisu Nam; Seungryong Kim

コストの集計は、少数のショットのセグメンテーションに必要なすべてです

畳み込みとトランスフォーマーの両方を使用してクエリとサポート間の高次元相関マップを効率的に処理することにより、数ショットのセグメンテーションタスクに取り組むために、トランスフォーマーを使用したボリュームアグリゲーション（VAT）と呼ばれる新しいコストアグリゲーションネットワークを紹介します。具体的には、相関マップをより扱いやすいサイズに変換するだけでなく、コスト集計のために畳み込み誘導バイアスとボリュームトランスモジュールを注入するボリューム埋め込みモジュールで構成されるエンコーダを提案します。私たちのエンコーダーはピラミッド型の構造を持っており、より粗いレベルの集計がより細かいレベルを導き、補完的なマッチングスコアを学習するように強制します。次に、セグメンテーションプロセスをガイドするために、投影された特徴マップとともに出力をアフィニティ対応デコーダーにフィードします。これらのコンポーネントを組み合わせて、提案された方法の有効性を実証するための実験を行い、私たちの方法は、数ショットのセグメンテーションタスクのすべての標準ベンチマークに新しい最先端を設定します。さらに、提案された方法は、このタスク用に特別に設計されたものではありませんが、セマンティック対応タスクの標準ベンチマークでも最先端のパフォーマンスを達成することがわかります。また、アーキテクチャの選択を検証するための広範なアブレーション調査も提供しています。トレーニングされたウェイトとコードは、https：//seokju-cho.github.io/VAT/で入手できます。

We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support. In specific, we propose our encoder consisting of volume embedding module to not only transform the correlation maps into more tractable size but also inject some convolutional inductive bias and volumetric transformer module for the cost aggregation. Our encoder has a pyramidal structure to let the coarser level aggregation to guide the finer level and enforce to learn complementary matching scores. We then feed the output into our affinity-aware decoder along with the projected feature maps for guiding the segmentation process. Combining these components, we conduct experiments to demonstrate the effectiveness of the proposed method, and our method sets a new state-of-the-art for all the standard benchmarks in few-shot segmentation task. Furthermore, we find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task although not specifically designed for this task. We also provide an extensive ablation study to validate our architectural choices. The trained weights and codes are available at: https://seokju-cho.github.io/VAT/.

updated: Wed Dec 22 2021 06:18:51 GMT+0000 (UTC)

published: Wed Dec 22 2021 06:18:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト