S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Xiaotian Chen; Yuwang Wang; Xuejin Chen; Wenjun Zeng

S2R-DepthNet：一般化可能な深さ固有の構造表現の学習

人間は、現実的な画像ではなくスケッチからシーンの3Dジオメトリを推測できます。これは、空間構造がシーンの深さを理解する上で基本的な役割を果たしていることを示しています。私たちは、深さ推定に不可欠な機能をキャプチャし、無関係なスタイル情報を無視する、深さ固有の構造表現の学習を探求した最初の人です。 S2R-DepthNet（Synthetic to Real DepthNet）は、合成データのみでトレーニングされている場合でも、目に見えない実世界のデータに直接一般化できます。 S2R-DepthNetは、a）画像をドメイン不変の構造とドメイン固有のスタイルコンポーネントに解きほぐすことによって画像からドメイン不変の構造表現を抽出する構造抽出（STE）モジュール、b）深度固有の注意（DSA）で構成されます。モジュールは、タスク固有の知識を学習して深度に関係のない構造を抑制し、深度の推定と一般化を改善します。c）深度予測モジュール（DP）は、深度固有の表現から深度を予測します。実世界の画像にアクセスすることなく、私たちの方法は、トレーニングにターゲットドメインの実世界の画像を使用する最先端の教師なしドメイン適応方法よりも優れています。さらに、ラベル付けされた少量の実世界のデータを使用すると、半教師あり設定で最先端のパフォーマンスを実現します。コードとトレーニング済みモデルは、https：//github.com/microsoft/S2R-DepthNetで入手できます。

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domaininvariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-ofthe-art performance under the semi-supervised setting. The code and trained models are available at https://github.com/microsoft/S2R-DepthNet.

updated: Tue Jun 15 2021 07:24:40 GMT+0000 (UTC)

published: Fri Apr 02 2021 03:55:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト