Accurate Ground-Truth Depth Image Generation via Overfit Training of Point Cloud Registration using Local Frame Sets

Jiwan Kim; Minchang Kim; Yeong-Gil Shin; Minyoung Chung

ローカルフレームセットを使用した点群登録のオーバーフィットトレーニングによる正確なグラウンドトゥルース深度画像の生成

正確な3次元知覚は、いくつかのコンピュータービジョンアプリケーションの基本的なタスクです。最近、市販のRGB深度（RGB-D）カメラは、その効率的な深度検知機能により、シングルビュー深度検知デバイスとして広く採用されています。ただし、ほとんどのRGB-Dセンサーの深度品質は、シングルビュー環境に固有のノイズのために不十分なままです。最近、いくつかの研究がRGB-Dカメラのシングルビュー深度向上に焦点を合わせています。最近の研究では、高品質の教師あり深度データセットを使用してネットワークをトレーニングするディープラーニングベースのアプローチが提案されています。これは、グラウンドトゥルース（GT）深度データセットの品質が正確なシステムにとって最も重要な要素であることを示しています。ただし、このような高品質のGTデータセットを取得することは困難です。本研究では、RGB-Dストリームデータセットに基づく高品質のGT深度生成のための新しい方法を開発しました。まず、ローカル空間領域の連続する深度フレームをローカルフレームセットとして定義しました。次に、教師なし点群登録スキームを使用して、深度フレームをローカルフレームセットの特定のフレームに位置合わせしました。登録パラメータは、主にフレームセットごとに単一のGT深度画像を構築するために使用された過剰適合トレーニングスキームに基づいてトレーニングされました。最終的なGT深度データセットは、いくつかのローカルフレームセットを使用して構築され、各ローカルフレームセットは個別にトレーニングされました。この研究の主な利点は、RGB-Dストリームデータセットのみを使用して、さまざまなスキャン環境で高品質のGT深度データセットを構築できることです。さらに、提案された方法は、正確なパフォーマンス評価のための新しいベンチマークGTデータセットとして使用できます。以前にベンチマークされたGT深度データセットでGTデータセットを評価し、この方法が最先端の深度拡張フレームワークよりも優れていることを示しました。

Accurate three-dimensional perception is a fundamental task in several computer vision applications. Recently, commercial RGB-depth (RGB-D) cameras have been widely adopted as single-view depth-sensing devices owing to their efficient depth-sensing abilities. However, the depth quality of most RGB-D sensors remains insufficient owing to the inherent noise from a single-view environment. Recently, several studies have focused on the single-view depth enhancement of RGB-D cameras. Recent research has proposed deep-learning-based approaches that typically train networks using high-quality supervised depth datasets, which indicates that the quality of the ground-truth (GT) depth dataset is a top-most important factor for accurate system; however, such high-quality GT datasets are difficult to obtain. In this study, we developed a novel method for high-quality GT depth generation based on an RGB-D stream dataset. First, we defined consecutive depth frames in a local spatial region as a local frame set. Then, the depth frames were aligned to a certain frame in the local frame set using an unsupervised point cloud registration scheme. The registration parameters were trained based on an overfit-training scheme, which was primarily used to construct a single GT depth image for each frame set. The final GT depth dataset was constructed using several local frame sets, and each local frame set was trained independently. The primary advantage of this study is that a high-quality GT depth dataset can be constructed under various scanning environments using only the RGB-D stream dataset. Moreover, our proposed method can be used as a new benchmark GT dataset for accurate performance evaluations. We evaluated our GT dataset on previously benchmarked GT depth datasets and demonstrated that our method is superior to state-of-the-art depth enhancement frameworks.

updated: Thu Jul 14 2022 15:50:44 GMT+0000 (UTC)

published: Thu Jul 14 2022 15:50:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト