Towards Continual, Online, Self-Supervised Depth

Muhammad Umar Karim Khan

継続的、オンライン、自己監視の深さに向けて

パッシブセンサーを使用した深度抽出は、深層学習によって目覚ましい改善が見られましたが、これらのアプローチは、トレーニング中に観察されない環境にさらされた場合、正しい深度を取得できない可能性があります。ニューラルネットワークが展開されている間にトレーニングを行うオンライン適応と、自己監視学習は、ネットワークが外部の監視なしで展開されているシーンから学習できるため、便利なソリューションを提供します。ただし、オンライン適応により、ニューラルネットワークは過去を忘れてしまいます。したがって、過去のトレーニングは無駄になり、ネットワークは過去のシーンを観察すると良い結果を提供できません。この作業は、入力がオンラインで時間的に相関し、トレーニングが完全に自己監視される実用的なオンライン適応を扱います。オンラインデータに適応する際の壊滅的な忘却を回避するために、タスク境界のない正則化と再生ベースの方法が提案されています。提案されたアプローチを実際の使用に適したものにするための努力がなされてきた。この方法を、運動からの構造の推定とステレオ深度の推定の両方に適用します。私たちは、屋外、屋内、合成シーンを含む多様な公開データセットで私たちの方法を評価します。運動からの構造とステレオの両方を使用した定性的および定量的な結果は、最近の方法と比較して、優れた忘却と適応パフォーマンスを示しています。さらに、提案された方法は、オンライン適応のための微調整と比較して無視できるオーバーヘッドを被り、可塑性、安定性、および適用性の点で適切な選択であることが証明されています。提案されたアプローチは、ニューラルネットワークが監視なしで継続的に学習するため、人工知能パラダイムとより一致しています。ソースコードはhttps://github.com/umarKarim/cou_sfmおよびhttps://github.com/umarKarim/cou_stereoで入手できます。

Although depth extraction with passive sensors has seen remarkable improvement with deep learning, these approaches may fail to obtain correct depth if they are exposed to environments not observed during training. Online adaptation, where the neural network trains while deployed, with self-supervised learning provides a convenient solution as the network can learn from the scene where it is deployed without external supervision. However, online adaptation causes a neural network to forget the past. Thus, past training is wasted and the network is not able to provide good results if it observes past scenes. This work deals with practical online-adaptation where the input is online and temporally-correlated, and training is completely self-supervised. Regularization and replay-based methods without task boundaries are proposed to avoid catastrophic forgetting while adapting to online data. Effort has been made to make the proposed approach suitable for practical use. We apply our method to both structure-from-motion and stereo depth estimation. We evaluate our method on diverse public datasets that include outdoor, indoor and synthetic scenes. Qualitative and quantitative results with both structure-from-motion and stereo show superior forgetting as well as adaptation performance compared to recent methods. Furthermore, the proposed method incurs negligible overhead compared to fine-tuning for online adaptation, proving to be an adequate choice in terms of plasticity, stability and applicability. The proposed approach is more inline with the artificial general intelligence paradigm as the neural network learns continually with no supervision. Source code is available at https://github.com/umarKarim/cou_sfm and https://github.com/umarKarim/cou_stereo.

updated: Sun Jun 19 2022 11:22:56 GMT+0000 (UTC)

published: Sun Feb 28 2021 01:18:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト