Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

Fangfu Liu; Chubin Zhang; Yu Zheng; Yueqi Duan

Semantic Ray: Cross-Reprojection Attention による一般化可能なセマンティックフィールドの学習

この論文では、正確で効率的で一般化可能な複数のシーンからセマンティック放射輝度フィールドを学習することを目指しています。既存の NeRF のほとんどは、ニューラルシーンのレンダリング、画像合成、マルチビュー再構成のタスクを対象としていますが、Semantic-NeRF など、NeRF 構造を使用して高レベルのセマンティック理解を学習する試みがいくつかあります。ただし、Semantic-NeRF は、複数のヘッドを持つ単一の光線から色と意味ラベルを同時に学習します。単一の光線では豊富な意味情報を提供できません。その結果、セマンティック NeRF は位置エンコーディングに依存し、シーンごとに 1 つの特定のモデルをトレーニングする必要があります。これに対処するために、セマンティックレイ (S-Ray) を提案して、マルチビュー再投影からレイ方向に沿ったセマンティック情報を完全に活用します。マルチビューの再投影されたレイに対して密な注意を直接実行すると、計算コストが高くつくため、再投影されたレイに沿ってコンテキスト情報を分解し、複数のビューにまたがる連続したビュー内ラジアルおよびクロスビューの疎な注意を備えた相互再投影注意モジュールを設計します。モジュールを積み重ねることで密集した接続を集めます。実験によると、S-Ray は複数のシーンから学習でき、目に見えないシーンに適応する強力な一般化能力を示します。

In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. While most existing NeRFs target at the tasks of neural scene rendering, image synthesis and multi-view reconstruction, there are a few attempts such as Semantic-NeRF that explore to learn high-level semantic understanding with the NeRF structure. However, Semantic-NeRF simultaneously learns color and semantic label from a single ray with multiple heads, where the single ray fails to provide rich semantic information. As a result, Semantic NeRF relies on positional encoding and needs to train one specific model for each scene. To address this, we propose Semantic Ray (S-Ray) to fully exploit semantic information along the ray direction from its multi-view reprojections. As directly performing dense attention over multi-view reprojected rays would suffer from heavy computational cost, we design a Cross-Reprojection Attention module with consecutive intra-view radial and cross-view sparse attentions, which decomposes contextual information along reprojected rays and cross multiple views and then collects dense connections by stacking the modules. Experiments show that our S-Ray is able to learn from multiple scenes, and it presents strong generalization ability to adapt to unseen scenes.

updated: Thu Mar 23 2023 03:33:20 GMT+0000 (UTC)

published: Thu Mar 23 2023 03:33:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト