ADAADepth: Adapting Data Augmentation and Attention for Self-Supervised Monocular Depth Estimation

Vinay Kaushik; Kartik Jindgar; Brejesh Lall

ADAADepth：自己監視単眼深度推定のためのデータ拡張と注意の適応

深さの自己教師あり学習は、深さを予測するためのグラウンドトゥルース注釈の要件を軽減するため、高度に研究された研究トピックです。深さは、歪んだ測光の一貫性を利用して、ビュー合成のタスクの中間ソリューションとして学習されます。ステレオデータを使用してトレーニングすると良好な結果が得られますが、予測される深度は、ノイズ、照明の変化、鏡面反射に敏感です。また、1台のカメラから深度を学習することで、オクルージョンにうまく対処できます。正確でロバストな深度を学習するための深度監視として深度拡張を利用するADAAを提案します。豊富なコンテキスト機能を学習し、深さの結果をさらに強化するリレーショナル自己注意モジュールを提案します。また、マスクに対してL1正則化を適用することにより、すべての損失にわたって自動マスキング戦略を最適化します。私たちの新しいプログレッシブトレーニング戦略は、最初に低い解像度で深さを学習し、次にわずかなトレーニングで元の解像度に進みます。 ResNet18エンコーダーを利用して、深度とポーズの両方を予測するための機能を学習します。標準のKITTI運転データセットで予測深度を評価し、深層学習フレームワークでトレーニング可能なパラメーターの数を大幅に減らしながら、単眼深度推定の最先端の結果を達成します。また、Make3Dデータセットでモデルを評価し、他の方法よりも優れた一般化を示しています。

Self-supervised learning of depth has been a highly studied topic of research as it alleviates the requirement of having ground truth annotations for predicting depth. Depth is learnt as an intermediate solution to the task of view synthesis, utilising warped photometric consistency. Although it gives good results when trained using stereo data, the predicted depth is still sensitive to noise, illumination changes and specular reflections. Also, occlusion can be tackled better by learning depth from a single camera. We propose ADAA, utilising depth augmentation as depth supervision for learning accurate and robust depth. We propose a relational self-attention module that learns rich contextual features and further enhances depth results. We also optimize the auto-masking strategy across all losses by enforcing L1 regularisation over mask. Our novel progressive training strategy first learns depth at a lower resolution and then progresses to the original resolution with slight training. We utilise a ResNet18 encoder, learning features for prediction of both depth and pose. We evaluate our predicted depth on the standard KITTI driving dataset and achieve state-of-the-art results for monocular depth estimation whilst having significantly lower number of trainable parameters in our deep learning framework. We also evaluate our model on Make3D dataset showing better generalization than other methods.

updated: Mon Mar 01 2021 09:06:55 GMT+0000 (UTC)

published: Mon Mar 01 2021 09:06:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト