Self-Supervised Learning of Domain Invariant Features for Depth Estimation

Hiroyasu Akada; Shariq Farooq Bhat; Ibraheem Alhashim; Peter Wonka

深さ推定のためのドメイン不変特徴の自己教師あり学習

単一画像の深度推定のための教師なし合成から現実的な領域への適応の問題に取り組みます。単一画像の深度推定の重要な構成要素は、RGB 画像を入力として受け取り、出力として深度マップを生成するエンコーダー - デコーダータスクネットワークです。この論文では、タスクネットワークにドメイン不変表現を自己監視方法で学習させるための新しいトレーニング戦略を提案します。具体的には、単一ドメインの画像で動作する従来の表現学習から、画像から画像への変換ネットワークを利用して、2 つの異なるドメインの画像で動作するドメイン不変表現学習に自己教師あり学習を拡張します。まず、双方向の画像から画像への変換ネットワークを使用して、合成ドメインと実際のドメイン間でドメイン固有のスタイルを転送します。このスタイル転送操作により、異なるドメインから同様の画像を取得できます。次に、タスクネットワークのドメイン不変性を取得するために、異なるドメインからの同じ画像を使用してタスクネットワークとシャムネットワークを共同でトレーニングします。最後に、ラベル付けされた合成データとラベル付けされていない現実世界のデータを使用して、タスクネットワークを微調整します。私たちのトレーニング戦略は、現実世界の領域で改善された汎化能力をもたらします。深度推定用の 2 つの一般的なデータセットである KITTI と Make3D について、広範な評価を行っています。結果は、提案された方法が定性的にも定量的にも最先端のものを上回ることを示しています。ソースコードとモデルの重みが利用可能になります。

We tackle the problem of unsupervised synthetic-to-realistic domain adaptation for single image depth estimation. An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output. In this paper, we propose a novel training strategy to force the task network to learn domain invariant representations in a self-supervised manner. Specifically, we extend self-supervised learning from traditional representation learning, which works on images from a single domain, to domain invariant representation learning, which works on images from two different domains by utilizing an image-to-image translation network. Firstly, we use our bidirectional image-to-image translation network to transfer domain-specific styles between synthetic and real domains. This style transfer operation allows us to obtain similar images from the different domains. Secondly, we jointly train our task network and Siamese network with the same images from the different domains to obtain domain invariance for the task network. Finally, we fine-tune the task network using labeled synthetic and unlabeled real-world data. Our training strategy yields improved generalization capability in the real-world domain. We carry out an extensive evaluation on two popular datasets for depth estimation, KITTI and Make3D. The results demonstrate that our proposed method outperforms the state-of-the-art both qualitatively and quantitatively. The source code and model weights will be made available.

updated: Tue Jun 08 2021 09:02:07 GMT+0000 (UTC)

published: Fri Jun 04 2021 16:45:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト