Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

Bruno Sauvalle; Arnaud de La Fortelle

Attention と Soft-argmax を使用した教師なしマルチオブジェクトセグメンテーション

教師なしオブジェクト中心の表現学習とマルチオブジェクトの検出とセグメンテーションのための新しいアーキテクチャを導入します。これは、シーンに存在するオブジェクトの座標を予測し、特徴ベクトルを各オブジェクトに関連付けるために、並進同変アテンションメカニズムを使用します。変換エンコーダーがオクルージョンと冗長検出を処理し、畳み込みオートエンコーダーがバックグラウンドの再構成を担当します。このアーキテクチャは、複雑な合成ベンチマークで最先端技術を大幅に上回ることを示しています。

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.

updated: Wed Aug 31 2022 13:34:14 GMT+0000 (UTC)

published: Thu May 26 2022 10:58:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト