Compositional Scene Modeling with Global Object-Centric Representations

Tonglin Chen; Bin Li; Zhimeng Shen; Xiangyang Xue

グローバルなオブジェクト中心表現による合成シーンモデリング

オブジェクト間のパースペクティブとオクルージョンにより、同じオブジェクトの外観が異なるシーンイメージで異なる場合があります。人間は、オクルージョンが存在する場合でも、メモリ内の正規のイメージに基づいてオクルージョンされた部分を完成させることで、同じオブジェクトを簡単に識別できます。この能力を達成することは、特に教師なし学習の設定の下では、機械学習にとって依然として課題です。このような人間の能力に着想を得て、この論文では、監視なしでオブジェクトの標準的な画像のグローバルな表現を推測するための合成シーンモデリング手法を提案します。各オブジェクトの表現は、大域的に不変な情報 (つまり、オブジェクトの正規表現) を特徴付ける固有部分と、シーンに依存する情報 (位置やサイズなど) を特徴付ける外部部分に分割されます。各オブジェクトの固有の表現を推測するために、パッチマッチング戦略を採用して、潜在的に遮蔽されたオブジェクトの表現をオブジェクトの正規表現に揃え、償却変分推論によって決定されたオブジェクトのカテゴリに基づいて最も可能性の高い正規表現をサンプリングします。 . 4つのオブジェクト中心の学習ベンチマークで広範な実験が行われ、実験結果は、提案された方法がセグメンテーションと再構築の点で最先端技術を凌駕するだけでなく、優れたグローバルオブジェクト識別パフォーマンスも達成することを示しています。

The appearance of the same object may vary in different scene images due to perspectives and occlusions between objects. Humans can easily identify the same object, even if occlusions exist, by completing the occluded parts based on its canonical image in the memory. Achieving this ability is still a challenge for machine learning, especially under the unsupervised learning setting. Inspired by such an ability of humans, this paper proposes a compositional scene modeling method to infer global representations of canonical images of objects without any supervision. The representation of each object is divided into an intrinsic part, which characterizes globally invariant information (i.e. canonical representation of an object), and an extrinsic part, which characterizes scene-dependent information (e.g., position and size). To infer the intrinsic representation of each object, we employ a patch-matching strategy to align the representation of a potentially occluded object with the canonical representations of objects, and sample the most probable canonical representation based on the category of object determined by amortized variational inference. Extensive experiments are conducted on four object-centric learning benchmarks, and experimental results demonstrate that the proposed method not only outperforms state-of-the-arts in terms of segmentation and reconstruction, but also achieves good global object identification performance.

updated: Mon Nov 21 2022 14:36:36 GMT+0000 (UTC)

published: Mon Nov 21 2022 14:36:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト