Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning

Jinwoo Kim; Janghyuk Choi; Ho-Jin Choi; Seon Joo Kim

オブジェクトへのスロットのシェパディング: 安定したロバストなオブジェクト中心の学習に向けて

オブジェクト中心の学習 (OCL) は、シーンをオブジェクト中心の表現のコレクションとして表現することにより、シーンの一般的かつ構成的な理解を目指します。 OCL は、マルチビューイメージおよびビデオデータセットにも拡張され、マルチイメージデータの幾何学的情報または時間情報を利用して、さまざまなデータ駆動型の誘導バイアスを適用します。シングルビュー画像は、ビデオやマルチビュー画像よりも、特定のシーンを解きほぐす方法についての情報が少なくなります。したがって、誘導バイアスを適用することの難しさのために、単一ビュー画像のOCLは依然として困難であり、オブジェクト中心の表現の学習に一貫性がありません。この目的のために、単一ビュー画像用の新しい OCL フレームワークである SLot Attention via SHepherding (SLASH) を導入します。これは、Slot Attention の上にある 2 つのシンプルだが効果的なモジュールで構成されます。新しいモジュールである Attention Refining Kernel (ARK) と Intermediate Point Predictor and Encoder (IPPE) は、それぞれ、スロットがバックグラウンドノイズに気を取られるのを防ぎ、オブジェクト中心の表現の学習を促進するために焦点を当てるスロットの位置を示します。また、OCL の弱い半監視アプローチを提案しますが、提案されたフレームワークは、推論中にアシスタントの注釈なしで使用できます。実験は、提案された方法がオブジェクト中心の表現の一貫した学習を可能にし、4 つのデータセットにわたって強力なパフォーマンスを達成することを示しています。コードは https://github.com/object-understanding/SLASH で入手できます。

Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do. Hence, owing to the difficulty of applying inductive biases, OCL for single-view images remains challenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate locations for slots to focus on to facilitate learning of object-centric representation. We also propose a weak semi-supervision approach for OCL, whilst our proposed framework can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets. Code is available at https://github.com/object-understanding/SLASH.

updated: Fri Mar 31 2023 07:07:29 GMT+0000 (UTC)

published: Fri Mar 31 2023 07:07:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト