AutoFocusFormer: Image Segmentation off the Grid

Chen Ziwen; Kaushik Patnaik; Shuangfei Zhai; Alvin Wan; Zhile Ren; Alex Schwing; Alex Colburn; Li Fuxin

AutoFocusFormer: グリッドから離れた画像セグメンテーション

現実世界の画像は、多くの場合、コンテンツの密度が非常に不均衡です。青空の大きな区画など、非常に均一な領域もあれば、多くの小さなオブジェクトが点在する領域もあります。しかし、畳み込みディープネットワークで一般的に使用される連続グリッドダウンサンプリング戦略では、すべての領域が均等に扱われます。したがって、小さなオブジェクトは非常に少ない空間位置で表現され、セグメンテーションなどのタスクでより悪い結果につながります。直感的には、ダウンサンプリング中に小さなオブジェクトを表すより多くのピクセルを保持すると、重要な情報を保持するのに役立ちます。これを達成するために、ローカルアテンション変換画像認識バックボーンである AutoFocusFormer (AFF) を提案します。これは、タスクにとって最も重要なピクセルを保持することを学習することにより、適応ダウンサンプリングを実行します。アダプティブダウンサンプリングは、イメージプレーン上に不規則に分布する一連のピクセルを生成するため、従来のグリッド構造を放棄します。代わりに、バランスの取れたクラスタリングモジュールと学習可能な近隣マージモジュールによって促進される、新しいポイントベースのローカルアテンションブロックを開発します。これにより、最先端のセグメンテーションヘッドのポイントベースバージョンの表現が生成されます。実験では、AutoFocusFormer (AFF) が同様のサイズのベースラインモデルよりも大幅に改善されていることが示されています。

Real world images often have highly imbalanced content density. Some areas are very uniform, e.g., large patches of blue sky, while other areas are scattered with many small objects. Yet, the commonly used successive grid downsampling strategy in convolutional deep networks treats all areas equally. Hence, small objects are represented in very few spatial locations, leading to worse results in tasks such as segmentation. Intuitively, retaining more pixels representing small objects during downsampling helps to preserve important information. To achieve this, we propose AutoFocusFormer (AFF), a local-attention transformer image recognition backbone, which performs adaptive downsampling by learning to retain the most important pixels for the task. Since adaptive downsampling generates a set of pixels irregularly distributed on the image plane, we abandon the classic grid structure. Instead, we develop a novel point-based local attention block, facilitated by a balanced clustering module and a learnable neighborhood merging module, which yields representations for our point-based versions of state-of-the-art segmentation heads. Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes.

updated: Mon Apr 24 2023 19:37:23 GMT+0000 (UTC)

published: Mon Apr 24 2023 19:37:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト