PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers

Frank Yu; Mathieu Salzmann; Pascal Fua; Helge Rhodin

PCL：パースペクティブクロップレイヤーを使用した3Dポーズのジオメトリ対応ニューラル再構成

ローカル処理は、CNNやその他のニューラルネットワークアーキテクチャの重要な機能です。これが、関連情報の大部分がローカルである画像で非常にうまく機能する理由の1つです。ただし、従来のカメラでの投影に起因する遠近効果は、画像内のグローバルな位置によって異なります。パースペクティブクロップレイヤー（PCL）（カメラのジオメトリに基づく関心領域のパースペクティブクロップの形式）を紹介し、パースペクティブを考慮することで、最先端の3Dポーズ再構成方法の精度が一貫して向上することを示します。 PCLはモジュラーニューラルネットワークレイヤーであり、既存のCNNおよびMLPアーキテクチャに挿入すると、エンドツーエンドのトレーニングと基盤となるニューラルネットワークのパラメーターの数を変更せずに、場所に依存するパースペクティブ効果を決定論的に削除します。 PCLが、空間トランスフォーマーネットワーク（STN）などのトリミング操作を使用するCNNアーキテクチャ、および2Dから3Dへのキーポイントリフティングに使用されるMLPの3D人間ポーズ再構成精度の向上につながることを示します。私たちの結論は、古典的なコンピュータービジョンと深層学習ベースのコンピュータービジョンの両方で、利用可能な場合はカメラのキャリブレーション情報を利用することが重要であるということです。 PCLは、既存の3D再構成ネットワークをジオメトリ対応にすることで、それらの精度を向上させる簡単な方法を提供します。

Local processing is an essential feature of CNNs and other neural network architectures - it is one of the reasons why they work so well on images where relevant information is, to a large extent, local. However, perspective effects stemming from the projection in a conventional camera vary for different global positions in the image. We introduce Perspective Crop Layers (PCLs) - a form of perspective crop of the region of interest based on the camera geometry - and show that accounting for the perspective consistently improves the accuracy of state-of-the-art 3D pose reconstruction methods. PCLs are modular neural network layers, which, when inserted into existing CNN and MLP architectures, deterministically remove the location-dependent perspective effects while leaving end-to-end training and the number of parameters of the underlying neural network unchanged. We demonstrate that PCL leads to improved 3D human pose reconstruction accuracy for CNN architectures that use cropping operations, such as spatial transformer networks (STN), and, somewhat surprisingly, MLPs used for 2D-to-3D keypoint lifting. Our conclusion is that it is important to utilize camera calibration information when available, for classical and deep-learning-based computer vision alike. PCL offers an easy way to improve the accuracy of existing 3D reconstruction networks by making them geometry-aware.

updated: Fri Nov 27 2020 08:48:43 GMT+0000 (UTC)

published: Fri Nov 27 2020 08:48:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト