DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity

Yeying Jin; Wenhan Yang; Wei Ye; Yuan Yuan; Robby T. Tan

DeS3: ViT 類似性を使用した適応型アテンション駆動の自己およびソフトシャドウ除去

明確な境界がないソフトシャドウやセルフシャドウを 1 つの画像から削除するのは依然として困難です。セルフシャドウは、オブジェクト自体に投影される影です。既存の手法のほとんどは、ソフトシャドウとセルフシャドウのあいまいな境界を考慮せずに、バイナリシャドウマスクに依存しています。この論文では、適応的注意と ViT 類似性に基づいてハードシャドウ、ソフトシャドウ、セルフシャドウを除去する手法である DeS3 を紹介します。私たちの新しい ViT 類似性損失は、事前トレーニングされた Vision Transformer から抽出された特徴を利用します。この損失は、逆サンプリングをシーン構造の回復に導くのに役立ちます。私たちの適応的注意は、影の領域を下にあるオブジェクトから区別したり、影を落としているオブジェクトから影の領域を区別したりすることができます。この機能により、DeS3 は、オブジェクトが影によって部分的に隠されている場合でも、オブジェクトの構造をより適切に復元できるようになります。トレーニング段階で制約に依存する既存の方法とは異なり、サンプリング段階で ViT の類似性を組み込みます。私たちの手法は、SRD、AISTD、LRSS、USR、UIUC データセットに対して最先端の手法を上回り、ハードシャドウ、ソフトシャドウ、セルフシャドウを強力に除去します。具体的には、私たちの方法は、LRSS データセット上の画像全体の RMSE の 16% で SOTA 方法よりも優れています。

Removing soft and self shadows that lack clear boundaries from a single image is still challenging. Self shadows are shadows that are cast on the object itself. Most existing methods rely on binary shadow masks, without considering the ambiguous boundaries of soft and self shadows. In this paper, we present DeS3, a method that removes hard, soft and self shadows based on adaptive attention and ViT similarity. Our novel ViT similarity loss utilizes features extracted from a pre-trained Vision Transformer. This loss helps guide the reverse sampling towards recovering scene structures. Our adaptive attention is able to differentiate shadow regions from the underlying objects, as well as shadow regions from the object casting the shadow. This capability enables DeS3 to better recover the structures of objects even when they are partially occluded by shadows. Different from existing methods that rely on constraints during the training phase, we incorporate the ViT similarity during the sampling stage. Our method outperforms state-of-the-art methods on the SRD, AISTD, LRSS, USR and UIUC datasets, removing hard, soft, and self shadows robustly. Specifically, our method outperforms the SOTA method by 16% of the RMSE of the whole image on the LRSS dataset.

updated: Fri Aug 25 2023 18:07:06 GMT+0000 (UTC)

published: Tue Nov 15 2022 12:15:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト