Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

Ce Liu; Suryansh Kumar; Shuhang Gu; Radu Timofte; Luc Van Gool

単一画像深度予測の改善: 多変量ガウステイク

ニューラルネットワークベースの単一画像の深度予測 (SIDP) は、テスト時にシーンのピクセルごとの深度を予測することを目標とする困難なタスクです。この問題は定義上不適切な設定であるため、基本的な目標は、一連のトレーニング例からシーンの深度を確実にモデル化できるアプローチを考え出すことです。完全な深度推定を追求するために、既存の最先端の学習技術のほとんどは、ピクセルごとに 1 つのスカラー深度値を予測します。しかし、トレーニング済みのモデルには精度の限界があり、不正確な深さを予測できることはよく知られています。したがって、SIDP アプローチは、テスト時のモデルの予測で予想される深さの変動に注意する必要があります。したがって、ピクセルごとの深度とその分布を予測して推論できる、ピクセルごとの深度の連続モデリングを実行するアプローチを導入します。この目的のために、多変量ガウス分布を使用して、ピクセルごとのシーン深度をモデル化します。さらに、既存の不確実性モデリング方法とは対照的に、ピクセルごとの深度が独立していると想定される同じ精神で、すべてのシーンポイントに関する深度依存性をエンコードするピクセルごとの共分散モデリングを導入します。残念ながら、ピクセルごとの深度共分散モデリングは、計算コストの高い連続損失関数につながります。これは、学習した全体的な共分散行列の低ランク近似を使用して効率的に解決します。特に、KITTI、NYU、SUN-RGB-D などのベンチマークデータセットでテストした場合、損失関数を最適化して得られた SIDP モデルは最先端の結果を示しています。私たちの方法 (MG と名付けられた) の精度は、KITTI 深度予測ベンチマークリーダーボードのトップに位置しています。

Neural-network-based single image depth prediction (SIDP) is a challenging task where the goal is to predict the scene's per-pixel depth at test time. Since the problem, by definition, is ill-posed, the fundamental goal is to come up with an approach that can reliably model the scene depth from a set of training examples. In the pursuit of perfect depth estimation, most existing state-of-the-art learning techniques predict a single scalar depth value per-pixel. Yet, it is well-known that the trained model has accuracy limits and can predict imprecise depth. Therefore, an SIDP approach must be mindful of the expected depth variations in the model's prediction at test time. Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution. To this end, we model per-pixel scene depth using a multivariate Gaussian distribution. Moreover, contrary to the existing uncertainty modeling methods -- in the same spirit, where per-pixel depth is assumed to be independent, we introduce per-pixel covariance modeling that encodes its depth dependency w.r.t all the scene points. Unfortunately, per-pixel depth covariance modeling leads to a computationally expensive continuous loss function, which we solve efficiently using the learned low-rank approximation of the overall covariance matrix. Notably, when tested on benchmark datasets such as KITTI, NYU, and SUN-RGB-D, the SIDP model obtained by optimizing our loss function shows state-of-the-art results. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.

updated: Fri Mar 31 2023 16:01:03 GMT+0000 (UTC)

published: Fri Mar 31 2023 16:01:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト