Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections

Yifan Yang; Shuhai Zhang; Zixiong Huang; Yubing Zhang; Mingkui Tan

制約のない画像コレクションから新しいビューを合成するためのクロスレイニューラル放射輝度フィールド

Neural Radiance Fields (NeRF) は、ピクセルごとに 1 つのレイをサンプリングすることでシーンをレンダリングする革新的なアプローチであり、静的なシーンイメージからの新しいビューの合成において優れた機能を実証しています。ただし、実際には、通常、制約のない画像コレクションから NeRF を復元する必要があり、これには次の 2 つの課題が生じます。1) キャプチャ時間やカメラ設定の違いにより、画像の外観が動的に変化することがよくあります。 2) 画像には人間や車などの一時的なオブジェクトが含まれている可能性があり、オクルージョンやゴーストアーティファクトが発生します。従来のアプローチは、単一の光線を局所的に利用してピクセルの色を合成することで、これらの課題に対処しようとしています。対照的に、人間は通常、複数のピクセルにわたる情報をグローバルに利用することによって外観やオブジェクトを認識します。人間の知覚プロセスを模倣するために、この論文では、複数の光線にわたるインタラクティブな情報を活用して、画像と同じ外観を持つオクルージョンのない新しいビューを合成するクロスレイ NeRF (CR-NeRF) を提案します。具体的には、さまざまな外観をモデル化するために、最初に複数の光線を新しいクロス光線特徴で表現し、次に全体的な統計、つまり光線の特徴共分散と画像の外観を融合することによって外観を回復することを提案します。さらに、一時的なオブジェクトによってもたらされるオクルージョンを回避するために、一時的なオブジェクトハンドラーを提案し、一時的なオブジェクトをマスクするためのグリッドサンプリング戦略を導入します。理論的には、複数の光線にわたる相関関係を活用すると、よりグローバルな情報の取得が促進されることがわかります。さらに、大規模な現実世界のデータセットに関する広範な実験結果により、CR-NeRF の有効性が検証されています。

Neural Radiance Fields (NeRF) is a revolutionary approach for rendering scenes by sampling a single ray per pixel and it has demonstrated impressive capabilities in novel-view synthesis from static scene images. However, in practice, we usually need to recover NeRF from unconstrained image collections, which poses two challenges: 1) the images often have dynamic changes in appearance because of different capturing time and camera settings; 2) the images may contain transient objects such as humans and cars, leading to occlusion and ghosting artifacts. Conventional approaches seek to address these challenges by locally utilizing a single ray to synthesize a color of a pixel. In contrast, humans typically perceive appearance and objects by globally utilizing information across multiple pixels. To mimic the perception process of humans, in this paper, we propose Cross-Ray NeRF (CR-NeRF) that leverages interactive information across multiple rays to synthesize occlusion-free novel views with the same appearances as the images. Specifically, to model varying appearances, we first propose to represent multiple rays with a novel cross-ray feature and then recover the appearance by fusing global statistics, i.e., feature covariance of the rays and the image appearance. Moreover, to avoid occlusion introduced by transient objects, we propose a transient objects handler and introduce a grid sampling strategy for masking out the transient objects. We theoretically find that leveraging correlation across multiple rays promotes capturing more global information. Moreover, extensive experimental results on large real-world datasets verify the effectiveness of CR-NeRF.

updated: Tue Aug 15 2023 06:29:24 GMT+0000 (UTC)

published: Sun Jul 16 2023 16:29:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト