Learning Latent Part-Whole Hierarchies for Point Clouds

Xiang Gao; Wei Hu; Renjie Liao

点群の潜在部分全体階層の学習

強力な証拠は、人間が視覚的なシーンとオブジェクトを部分全体の階層に解析することによって 3D 世界を知覚することを示唆しています。ディープニューラルネットワークには強力なマルチレベル表現を学習する機能がありますが、部分全体の階層を明示的にモデル化することはできず、点群などの 3D ビジョンデータの処理における表現力と解釈可能性が制限されます。この目的のために、マルチレベルの点群セグメンテーションの部分全体階層を明示的に学習するエンコーダー/デコーダースタイルの潜在変数モデルを提案します。具体的には、エンコーダーはポイントクラウドを入力として受け取り、中間レベルでのポイントごとの潜在サブパーツ分布を予測します。デコーダーは潜在変数とエンコーダーからの特徴を入力として受け取り、トップレベルでポイントごとのパーツ分布を予測します。トレーニング中は、トップレベルの注釈付きパーツラベルのみが提供されるため、フレームワーク全体が弱く監視されます。 2 種類の近似推論アルゴリズム、すなわち最確潜在法とモンテカルロ法、および離散潜在変数を学習するための 3 つの確率的勾配推定法、すなわち、ストレートスルー、REINFORCE、およびパスワイズ推定量を調べます。 PartNet データセットの実験結果は、提案された方法がトップレベルの部分セグメンテーションだけでなく、中間レベルの潜在サブパートセグメンテーションでも最先端のパフォーマンスを達成することを示しています。

Strong evidence suggests that humans perceive the 3D world by parsing visual scenes and objects into part-whole hierarchies. Although deep neural networks have the capability of learning powerful multi-level representations, they can not explicitly model part-whole hierarchies, which limits their expressiveness and interpretability in processing 3D vision data such as point clouds. To this end, we propose an encoder-decoder style latent variable model that explicitly learns the part-whole hierarchies for the multi-level point cloud segmentation. Specifically, the encoder takes a point cloud as input and predicts the per-point latent subpart distribution at the middle level. The decoder takes the latent variable and the feature from the encoder as an input and predicts the per-point part distribution at the top level. During training, only annotated part labels at the top level are provided, thus making the whole framework weakly supervised. We explore two kinds of approximated inference algorithms, i.e., most-probable-latent and Monte Carlo methods, and three stochastic gradient estimations for learning discrete latent variables, i.e., straight-through, REINFORCE, and pathwise estimators. Experimental results on the PartNet dataset show that the proposed method achieves state-of-the-art performance in not only top-level part segmentation but also middle-level latent subpart segmentation.

updated: Mon Nov 14 2022 03:17:33 GMT+0000 (UTC)

published: Mon Nov 14 2022 03:17:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト