RePAST: Relative Pose Attention Scene Representation Transformer

Aleksandr Safin; Daniel Duckworth; Mehdi S. M. Sajjadi

Scene Representation Transformer (SRT) は、新しいビューをインタラクティブなレートでレンダリングする最近の方法です。 SRT は、任意に選択された参照カメラに対するカメラポーズを使用するため、入力ビューの順序に対して不変ではありません。その結果、SRT は、参照フレームを定期的に変更する必要がある大規模なシーンには直接適用できません。この作業では、Relative Pose Attention SRT (RePAST) を提案します。入力で参照フレームを固定する代わりに、ペアワイズ相対カメラポーズ情報をトランスフォーマーのアテンションメカニズムに直接注入します。これにより、元の方法のすべての機能を保持しながら、定義上、グローバル参照フレームの選択に対して不変であるモデルが得られます。経験的な結果は、この不変性をモデルに追加しても品質が低下しないことを示しています。これは、完全に潜在的なトランスフォーマーベースのレンダリング方法を大規模なシーンに適用するためのステップであると考えています。

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.

updated: Mon Apr 10 2023 13:11:13 GMT+0000 (UTC)

published: Mon Apr 03 2023 13:13:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト