FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation

Jie Qin; Jie Wu; Pengxiang Yan; Ming Li; Ren Yuxi; Xuefeng Xiao; Yitong Wang; Rui Wang; Shilei Wen; Xin Pan; Xingang Wang

FreeSeg: 統一された、普遍的でオープンな語彙の画像セグメンテーション

最近、テキストベースの説明の任意のカテゴリのセグメンテーションを達成するために、オープン語彙学習が登場しました。これにより、セグメンテーションシステムがより汎用的なアプリケーションシナリオに普及しました。ただし、既存の方法は、特定のセグメンテーションタスク用の特殊なアーキテクチャまたはパラメーターの設計に専念しています。これらのカスタマイズされた設計パラダイムは、さまざまなセグメンテーションタスク間の断片化につながり、セグメンテーションモデルの均一性を妨げます。したがって、このホワイトペーパーでは、統合されたユニバーサルでオープンな語彙の画像セグメンテーションを実現するための汎用フレームワークである FreeSeg を提案します。 FreeSeg は、ワンショットトレーニングによってオールインワンネットワークを最適化し、同じアーキテクチャとパラメーターを使用して、推論手順でさまざまなセグメンテーションタスクをシームレスに処理します。さらに、アダプティブプロンプトラーニングにより、統合モデルがタスク認識およびカテゴリに依存する概念を捉えやすくなり、マルチタスクやさまざまなシナリオでのモデルの堅牢性が向上します。広範な実験結果は、FreeSeg が 3 つのセグメンテーションタスクのパフォーマンスと一般化において新しい最先端の結果を確立することを示しています。これは、最高のタスク固有のアーキテクチャよりも大幅に優れています: セマンティックセグメンテーションで 5.5% mIoU、インスタンスで 17.6% mAPセグメンテーション、COCO の目に見えないクラスのパノプティックセグメンテーションで 20.1% PQ。

Recently, open-vocabulary learning has emerged to accomplish segmentation for arbitrary categories of text-based descriptions, which popularizes the segmentation system to more general-purpose application scenarios. However, existing methods devote to designing specialized architectures or parameters for specific segmentation tasks. These customized design paradigms lead to fragmentation between various segmentation tasks, thus hindering the uniformity of segmentation models. Hence in this paper, we propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation. FreeSeg optimizes an all-in-one network via one-shot training and employs the same architecture and parameters to handle diverse segmentation tasks seamlessly in the inference procedure. Additionally, adaptive prompt learning facilitates the unified model to capture task-aware and category-sensitive concepts, improving model robustness in multi-task and varied scenarios. Extensive experimental results demonstrate that FreeSeg establishes new state-of-the-art results in performance and generalization on three segmentation tasks, which outperforms the best task-specific architectures by a large margin: 5.5% mIoU on semantic segmentation, 17.6% mAP on instance segmentation, 20.1% PQ on panoptic segmentation for the unseen class on COCO.

updated: Thu Mar 30 2023 08:42:49 GMT+0000 (UTC)

published: Thu Mar 30 2023 08:42:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト