ViCE: Visual Concept Embedding Discovery and Superpixelization

Robin Karlsson; Tomoki Hayashi; Keisuke Fujii; Alexander Carballo; Kento Ohtani; Kazuya Takeda

ViCE：ディスカバリーとスーパーピクセル化を組み込んだビジュアルコンセプト

最近の自己監視型コンピュータービジョン手法は、監視あり手法と同等以上のパフォーマンスを示しており、AIシステムが実質的に無制限のデータから視覚的表現を学習できるようになっています。ただし、これらの方法は分類ベースであるため、教師なしセマンティックセグメンテーションに必要な密な特徴マップの学習には効果がありません。この作品は、高解像度画像に適用可能な高密度の意味的に豊富な視覚的概念の埋め込みを効果的に学習する方法を提示します。画像を視覚的にコヒーレントな領域の小さなセットに分解する手段としてスーパーピクセル化を導入し、スワップされた予測による高密度セマンティクスの効率的な学習を可能にします。高密度埋め込みの表現力は、低解像度画像と高解像度画像の両方について、COCO（+16.27 mIoU）と都市景観（+19.24 mIoU）のSOTA表現品質ベンチマークを大幅に改善することで実証されています。

Recent self-supervised computer vision methods have demonstrated equal or better performance to supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are classification-based and thus ineffective for learning dense feature maps required for unsupervised semantic segmentation. This work presents a method to effectively learn dense semantically rich visual concept embeddings applicable to high-resolution images. We introduce superpixelization as a means to decompose images into a small set of visually coherent regions, allowing efficient learning of dense semantics by swapped prediction. The expressiveness of our dense embeddings is demonstrated by significantly improving the SOTA representation quality benchmarks on COCO (+16.27 mIoU) and Cityscapes (+19.24 mIoU) for both low- and high-resolution images.

updated: Fri Apr 22 2022 07:39:10 GMT+0000 (UTC)

published: Wed Nov 24 2021 12:27:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト