DeS3: Attention-driven Self and Soft Shadow Removal using ViT Similarity and Color Convergence

Yeying Jin; Wenhan Yang; Wei Ye; Yuan Yuan; Robby T. Tan

DeS3: ViT 類似性とカラーコンバージェンスを使用した、注意主導のセルフシャドウとソフトシャドウの除去

1 つの画像から明確な境界がないソフトシャドウやセルフシャドウを除去することは、依然として困難です。セルフシャドウは、オブジェクト自体に投影される影です。既存の方法のほとんどは、ソフトシャドウとセルフシャドウのあいまいな境界を考慮せずに、バイナリシャドウマスクに依存しています。このホワイトペーパーでは、自己調整された ViT 機能の類似性と色の収束に基づいて、ハード、ソフト、およびセルフシャドウを削除する方法である DeS3 を紹介します。私たちの新しい ViT 類似性損失は、事前にトレーニングされた Vision Transformer から抽出された機能を利用します。この損失は、逆拡散プロセスをシーン構造の回復に導くのに役立ちます。また、カラーシフトを回避するために、逆推論プロセスで表面の色を制限するために、カラーコンバージェンスロスを導入します。私たちの DeS3 は、下にあるオブジェクトから影の領域を区別するだけでなく、影を落とすオブジェクトから影の領域を区別することができます。この機能により、DeS3 はオブジェクトが影によって部分的に遮られている場合でも、オブジェクトの構造をより適切に回復できます。トレーニング段階で制約に依存する既存の方法とは異なり、サンプリング段階で ViT の類似性と色の収束損失を組み込みます。これにより、当社の DeS3 モデルは、その強力なモデリング機能を入力固有の知識と効果的に自己調整された方法で統合することができます。私たちの方法は、SRD、AISTD、LRSS、USR、および UIUC データセットで最先端の方法よりも優れており、ハード、ソフト、およびセルフシャドウを確実に削除します。具体的には、私たちの方法は、SRD データセットの画像全体の RMSE の 20% で SOTA 方法よりも優れています。

Removing soft and self shadows that lack clear boundaries from a single image is still challenging. Self shadows are shadows that are cast on the object itself. Most existing methods rely on binary shadow masks, without considering the ambiguous boundaries of soft and self shadows. In this paper, we present DeS3, a method that removes hard, soft and self shadows based on the self-tuned ViT feature similarity and color convergence. Our novel ViT similarity loss utilizes features extracted from a pre-trained Vision Transformer. This loss helps guide the reverse diffusion process towards recovering scene structures. We also introduce a color convergence loss to constrain the surface colors in the reverse inference process to avoid any color shifts. Our DeS3 is able to differentiate shadow regions from the underlying objects, as well as shadow regions from the object casting the shadow. This capability enables DeS3 to better recover the structures of objects even when they are partially occluded by shadows. Different from existing methods that rely on constraints during the training phase, we incorporate the ViT similarity and color convergence loss during the sampling stage. This enables our DeS3 model to effectively integrate its strong modeling capabilities with input-specific knowledge in a self-tuned manner. Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets, removing hard, soft, and self shadows robustly. Specifically, our method outperforms the SOTA method by 20% of the RMSE of the whole image on the SRD dataset.

updated: Tue Mar 28 2023 14:55:56 GMT+0000 (UTC)

published: Tue Nov 15 2022 12:15:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト