GARNet: Global-Aware Multi-View 3D Reconstruction Network and the Cost-Performance Tradeoff

Zhenwei Zhu; Liying Yang; Xuxin Lin; Chaohao Jiang; Ning Li; Lin Yang; Yanyan Liang

GARNet: Global-Aware Multi-View 3D Reconstruction Network とコストパフォーマンスのトレードオフ

深層学習技術は、マルチビュー 3D 再構成タスクにおいて大きな進歩を遂げました。現在、ほとんどの主流のソリューションは、2D エンコーダーと 3D デコーダーのネットワークを基本構造として組み立てることにより、ビューとオブジェクトの形状の間のマッピングを確立しますが、いくつかのビューから特徴の集約を取得するために異なるアプローチを採用しています。その中で、注意ベースのフュージョンを使用する方法は、他の方法よりも優れた安定性を発揮しますが、まだ明らかな欠点があります。マージの重みを予測する際の各ビューの強い独立性は、グローバルの適応の欠如につながります。州。このホワイトペーパーでは、重み推論の包括的な基盤を提供するために、各ブランチとグローバルの間の相関関係を構築する、グローバルを意識したアテンションベースのフュージョンアプローチを提案します。ネットワークの能力を強化するために、形状全体を監視する新しい損失関数を導入し、注意ベースの融合ですべての再構築者に効果的に適応できる動的な 2 段階のトレーニング戦略を提案します。 ShapeNet での実験では、パラメータの量が同じタイプのアルゴリズムである Pix2Vox++ よりもはるかに少ない一方で、このメソッドが既存の SOTA メソッドよりも優れていることが確認されています。さらに、多様性の最大化に基づくビュー削減方法を提案し、モデルのコストパフォーマンスのトレードオフについて説明して、大量の入力と限られた計算コストに直面した場合にパフォーマンスを向上させます。

Deep learning technology has made great progress in multi-view 3D reconstruction tasks. At present, most mainstream solutions establish the mapping between views and shape of an object by assembling the networks of 2D encoder and 3D decoder as the basic structure while they adopt different approaches to obtain aggregation of features from several views. Among them, the methods using attention-based fusion perform better and more stable than the others, however, they still have an obvious shortcoming -- the strong independence of each view during predicting the weights for merging leads to a lack of adaption of the global state. In this paper, we propose a global-aware attention-based fusion approach that builds the correlation between each branch and the global to provide a comprehensive foundation for weights inference. In order to enhance the ability of the network, we introduce a novel loss function to supervise the shape overall and propose a dynamic two-stage training strategy that can effectively adapt to all reconstructors with attention-based fusion. Experiments on ShapeNet verify that our method outperforms existing SOTA methods while the amount of parameters is far less than the same type of algorithm, Pix2Vox++. Furthermore, we propose a view-reduction method based on maximizing diversity and discuss the cost-performance tradeoff of our model to achieve a better performance when facing heavy input amount and limited computational cost.

updated: Fri Nov 04 2022 07:45:19 GMT+0000 (UTC)

published: Fri Nov 04 2022 07:45:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト