Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer

Jiaming Liu; Qizhe Zhang; Jianing Li; Ming Lu; Tiejun Huang; Shanghang Zhang

クロスモダリティクロスドメイン知識伝達による教師なしスパイク深度推定

ニューロモルフィックスパイクカメラは、バイオに着想を得た方法で高時間分解能のデータストリームを生成します。これは、自動運転などの実世界のアプリケーションで大きな可能性を秘めています。 RGB ストリームとは対照的に、スパイクストリームにはモーションブラーを克服する固有の利点があり、高速オブジェクトの深度推定がより正確になります。ただし、一時的に集中的なスパイクストリームのペアの深度ラベルを取得することは非常に面倒で困難であるため、監視された方法でスパイク深度推定ネットワークをトレーニングすることはほとんど不可能です。このホワイトペーパーでは、完全な深度ラベルを使用してスパイクストリームデータセットを構築する代わりに、オープンソース RGB データセット (KITTI など) から知識を転送し、教師なしでスパイク深度を推定します。このような問題の主な課題は、RGB モダリティとスパイクモダリティ間のモダリティギャップ、およびラベル付きソース RGB とラベルなしターゲットスパイクドメイン間のドメインギャップにあります。これらの課題を克服するために、教師なしスパイク深度推定のためのクロスモダリティクロスドメイン (BiCross) フレームワークを導入します。私たちの方法は、中間のシミュレートされたソーススパイクドメインを導入することにより、ソースRGBとターゲットスパイクの間の巨大なギャップを狭めます。具体的には、クロスモダリティフェーズでは、画像とピクセルレベルの知識をソース RGB からソーススパイクに転送する、新しい Coarse-to-Fine Knowledge Distillation (CFKD) を提案します。このような設計は、RGB モダリティとスパイクモダリティの豊富なセマンティック情報と高密度の時間情報をそれぞれ活用します。クロスドメインフェーズでは、Uncertainty Guided Mean-Teacher (UGMT) を導入して、ソーススパイクドメインとターゲットスパイクドメインの間のシフトを軽減し、不確実性の推定を使用して信頼性の高い疑似ラベルを生成します。さらに、2つのドメイン間で機能を調整し、より信頼性の高い疑似ラベルを生成するために、グローバルレベルの機能調整方法（GLFA）を提案します。

The neuromorphic spike camera generates data streams with high temporal resolution in a bio-inspired way, which has vast potential in the real-world applications such as autonomous driving. In contrast to RGB streams, spike streams have an inherent advantage to overcome motion blur, leading to more accurate depth estimation for high-velocity objects. However, training the spike depth estimation network in a supervised manner is almost impossible since it is extremely laborious and challenging to obtain paired depth labels for temporally intensive spike streams. In this paper, instead of building a spike stream dataset with full depth labels, we transfer knowledge from the open-source RGB datasets (e.g., KITTI) and estimate spike depth in an unsupervised manner. The key challenges for such problem lie in the modality gap between RGB and spike modalities, and the domain gap between labeled source RGB and unlabeled target spike domains. To overcome these challenges, we introduce a cross-modality cross-domain (BiCross) framework for unsupervised spike depth estimation. Our method narrows the enormous gap between source RGB and target spike by introducing the mediate simulated source spike domain. To be specific, for the cross-modality phase, we propose a novel Coarse-to-Fine Knowledge Distillation (CFKD), which transfers the image and pixel level knowledge from source RGB to source spike. Such design leverages the abundant semantic and dense temporal information of RGB and spike modalities respectively. For the cross-domain phase, we introduce the Uncertainty Guided Mean-Teacher (UGMT) to generate reliable pseudo labels with uncertainty estimation, alleviating the shift between the source spike and target spike domains. Besides, we propose a Global-Level Feature Alignment method (GLFA) to align the feature between two domains and generate more reliable pseudo labels.

updated: Fri Aug 26 2022 09:35:20 GMT+0000 (UTC)

published: Fri Aug 26 2022 09:35:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト