Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer

T. Takada; W. Shimaya; Y. Ohmura; Y. Kuniyoshi

形状不変のリー群トランスフォーマーを使用した画像の1つのシーケンスからのパターンと変換の解きほぐし

複雑な実世界をモデル化する効果的な方法は、世界をオブジェクトと変換の基本コンポーネントの構成として表示することです。人間は開発を通じて現実世界の構成性を理解していますが、ロボットにそのような学習メカニズムを装備することは非常に困難です。近年、深層学習を使用して世界の表現を自律的に学習することに関する重要な研究が行われています。ただし、ほとんどの研究は統計的アプローチを採用しており、大量のトレーニングデータが必要です。このような既存の方法とは異なり、観察された世界は複数の独立したパターンとパターンの形状に不変な変換の組み合わせであるという、より単純で直感的な定式化に基づいて、表現学習に新しい代数的アプローチを採用します。パターンの形状は、平行移動や回転などの対称変換に対する不変の特徴と見なすことができるため、対称Lieグループ変換器を使用して変換を表現し、それらを使用してシーンを再構築することで、パターンを自然に抽出できることが期待できます。この考えに基づいて、学習可能な形状不変のリー群トランスフォーマーを変換コンポーネントとして導入することにより、シーンをパターンの基本コンポーネントの最小数に解きほぐし、画像の1つのシーケンスのみからリー変換を行うモデルを提案します。実験は、2つのオブジェクトが独立して移動している画像の1つのシーケンスが与えられた場合、提案されたモデルは、シーンを構成する隠された別個のオブジェクトと複数の形状不変変換を発見できることを示しています。

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism. In recent years, there has been significant research on autonomously learning representations of the world using the deep learning; however, most studies have taken a statistical approach, which requires a large number of training data. Contrary to such existing methods, we take a novel algebraic approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations that are invariant to the shape of patterns. Since the shape of patterns can be viewed as the invariant features against symmetric transformations such as translation or rotation, we can expect that the patterns can naturally be extracted by expressing transformations with symmetric Lie group transformers and attempting to reconstruct the scene with them. Based on this idea, we propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images, by introducing the learnable shape-invariant Lie group transformers as transformation components. Experiments show that given one sequence of images in which two objects are moving independently, the proposed model can discover the hidden distinct objects and multiple shape-invariant transformations that constitute the scenes.

updated: Mon Mar 21 2022 11:55:13 GMT+0000 (UTC)

published: Mon Mar 21 2022 11:55:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト