IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes

Rui Zhu; Zhengqin Li; Janarbek Matai; Fatih Porikli; Manmohan Chandraker

IRISformer：屋内シーンでの単一画像逆レンダリング用の高密度ビジョントランスフォーマー

屋内のシーンは、任意に多様なオブジェクトの形状、空間的に変化するマテリアル、および複雑な照明の間の無数の相互作用により、外観に大きなばらつきがあります。可視光源と不可視光源によって引き起こされるシャドウ、ハイライト、および相互反射には、画像形成のコンポーネント、つまり形状、マテリアル、および照明を回復しようとする逆レンダリングの長距離相互作用について推論する必要があります。この作業では、トランスフォーマーアーキテクチャによって学習された長期的な注意が、単一画像の逆レンダリングにおける長年の課題を解決するのに理想的に適しているという直感があります。密なビジョントランスフォーマーであるIRISformerの特定のインスタンス化を使用して、逆レンダリングに必要なシングルタスクとマルチタスクの両方の推論に優れていることを示します。具体的には、屋内シーンの単一の画像から深度、法線、空間的に変化するアルベド、粗さ、および照明を同時に推定するためのトランスアーキテクチャを提案します。ベンチマークデータセットに対する私たちの広範な評価は、上記の各タスクに関する最先端の結果を示しており、以前の作品よりも優れたフォトリアリズムを備えた、単一の制約のない実像でのオブジェクト挿入やマテリアル編集などのアプリケーションを可能にします。コードとデータはhttps://github.com/ViLab-UCSD/IRISformerで公開されています。

Indoor scenes exhibit significant appearance variations due to myriad interactions between arbitrarily diverse object shapes, spatially-changing materials, and complex lighting. Shadows, highlights, and inter-reflections caused by visible and invisible light sources require reasoning about long-range interactions for inverse rendering, which seeks to recover the components of image formation, namely, shape, material, and lighting. In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. We demonstrate with a specific instantiation of a dense vision transformer, IRISformer, that excels at both single-task and multi-task reasoning required for inverse rendering. Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene. Our extensive evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image, with greater photorealism than prior works. Code and data are publicly released at https://github.com/ViLab-UCSD/IRISformer.

updated: Thu Jun 16 2022 19:50:55 GMT+0000 (UTC)

published: Thu Jun 16 2022 19:50:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト