AISFormer: Amodal Instance Segmentation with Transformer

Minh Tran; Khoa Vo; Kashu Yamazaki; Arthur Fernandes; Michael Kidd; Ngan Le

AISFormer: Transformer を使用した非モーダルインスタンスセグメンテーション

Amodal インスタンスセグメンテーション (AIS) は、オブジェクトインスタンスの可視部分と隠れている可能性のある部分の両方の領域をセグメント化することを目的としています。 Mask R-CNN ベースの AIS アプローチは有望な結果を示していますが、受容野が限られているため、高レベルの特徴の一貫性をモデル化することはできません。最新の変換器ベースのモデルは、ビジョンタスクで印象的なパフォーマンスを示しており、畳み込みニューラルネットワーク (CNN) よりも優れています。この作業では、トランスフォーマーベースのマスクヘッドを備えた AIS フレームワークである AISFormer を紹介します。 AISFormer は、オブジェクトの関心領域内のオクルーダー、可視、非モーダル、および不可視のマスク間の複雑な一貫性を、それらを学習可能なクエリとして扱うことによって明示的にモデル化します。具体的には、AISFormer には次の 4 つのモジュールが含まれています。(i) 特徴のエンコード: ROI を抽出し、短距離と遠距離の両方の視覚的特徴を学習します。 (ii) マスクトランスフォーマーのデコード: トランスフォーマーデコーダーによるオクルーダー、可視、およびアモーダルマスククエリ埋め込みの生成 (iii) 不可視マスクの埋め込み: アモーダルマスクと可視マスク間のコヒーレンスのモデル化、および (iv) マスク予測: を含む出力マスクの推定オクルーダー、可視、非モーダル、および不可視。 AISFormer の有効性を評価するために、KINS、D2SA、および COCOA-cls という 3 つの挑戦的なベンチマークで広範な実験とアブレーション研究を実施しています。コードは https://github.com/UARK-AICV/AISFormer で入手できます。

Amodal Instance Segmentation (AIS) aims to segment the region of both visible and possible occluded parts of an object instance. While Mask R-CNN-based AIS approaches have shown promising results, they are unable to model high-level features coherence due to the limited receptive field. The most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present AISFormer, an AIS framework, with a Transformer-based mask head. AISFormer explicitly models the complex coherence between occluder, visible, amodal, and invisible masks within an object's regions of interest by treating them as learnable queries. Specifically, AISFormer contains four modules: (i) feature encoding: extract ROI and learn both short-range and long-range visual features. (ii) mask transformer decoding: generate the occluder, visible, and amodal mask query embeddings by a transformer decoder (iii) invisible mask embedding: model the coherence between the amodal and visible masks, and (iv) mask predicting: estimate output masks including occluder, visible, amodal and invisible. We conduct extensive experiments and ablation studies on three challenging benchmarks i.e. KINS, D2SA, and COCOA-cls to evaluate the effectiveness of AISFormer. The code is available at: https://github.com/UARK-AICV/AISFormer

updated: Mon Mar 06 2023 05:00:50 GMT+0000 (UTC)

published: Wed Oct 12 2022 15:42:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト