Generative-Contrastive Learning for Self-Supervised Latent Representations of 3D Shapes from Multi-Modal Euclidean Input

Chengzhi Wu; Julius Pfrommer; Mingyuan Zhou; Jürgen Beyerer

マルチモーダルユークリッド入力からの 3D 形状の自己教師あり潜在表現のための生成対比学習

3D ボリューム形状の潜在表現を学習するための生成的および対照的なニューラルアーキテクチャを組み合わせて提案します。このアーキテクチャでは、ボクセルグリッドと同じ基本形状からのマルチビューイメージに 2 つのエンコーダブランチが使用されます。主なアイデアは、結果として得られる潜在表現間の対照的な損失と、追加の再構成損失を組み合わせることです。これは、コントラストの損失を最小限に抑えるための簡単な解決策として、潜在的な表現の崩壊を回避するのに役立ちます。共有デコーダーを使用して 2 つのエンコーダーをクロストレーニングするために、新しいスイッチング方式が使用されます。スイッチングスキームは、ランダムブランチでの停止勾配操作も可能にします。さらなる分類実験は、自己教師ありメソッドで学習された潜在表現が、追加の入力データからのより有用な情報を暗黙的に統合し、再構築と分類のパフォーマンスを向上させることを示しています。

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel switching scheme is used to cross-train two encoders with a shared decoder. The switching scheme also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.

updated: Wed Jan 11 2023 18:14:24 GMT+0000 (UTC)

published: Wed Jan 11 2023 18:14:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト