The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

Ilya Chugunov; Yuxuan Zhang; Zhihao Xia; Cecilia Zhang; Jiawen Chen; Felix Heide

良いハンドシェイクの暗黙の値：ハンドヘルドマルチフレームニューラル深度の精密化

最新のスマートフォンは、高品質の3Dポーズ情報および低解像度のLiDAR駆動の深度推定と同期して、60〜HzでマルチメガピクセルのRGB画像を継続的にストリーミングできます。スナップショット写真の間、写真家の手の自然な不安定さは、カメラのポーズのミリメートルスケールの変化を提供します。これは、RGBおよび円形バッファの深度とともにキャプチャできます。この作業では、ビューファインディング中に取得されたこれらの測定値のバンドルから、高密度のマイクロベースライン視差キューをキロピクセルのLiDAR深度と組み合わせて、忠実度の高い深度マップを抽出する方法を探ります。テスト時間最適化アプローチを採用し、座標MLPをトレーニングして、写真家の自然な握手によってトレースされたパスに沿った連続座標で、測光的および幾何学的に一貫した深度推定値を出力します。提案された方法は、高解像度の深度推定を「オートフォーカス」卓上写真にもたらし、ボタンを押す以外に追加のハードウェア、人工的な手の動き、またはユーザーの操作を必要としません。

Modern smartphones can continuously stream multi-megapixel RGB images at 60~Hz, synchronized with high-quality 3D pose information and low-resolution LiDAR-driven depth estimates. During a snapshot photograph, the natural unsteadiness of the photographer's hands offers millimeter-scale variation in camera pose, which we can capture along with RGB and depth in a circular buffer. In this work we explore how, from a bundle of these measurements acquired during viewfinding, we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth to distill a high-fidelity depth map. We take a test-time optimization approach and train a coordinate MLP to output photometrically and geometrically consistent depth estimates at the continuous coordinates along the path traced by the photographer's natural hand shake. The proposed method brings high-resolution depth estimates to 'point-and-shoot' tabletop photography and requires no additional hardware, artificial hand motion, or user interaction beyond the press of a button.

updated: Fri Nov 26 2021 20:24:07 GMT+0000 (UTC)

published: Fri Nov 26 2021 20:24:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト