Self-Supervised Intensity-Event Stereo Matching

Jinjin Gu; Jinan Zhou; Ringo Sai Wo Chu; Yan Chen; Jiawei Zhang; Xuanye Cheng; Song Zhang; Jimmy S. Ren

自己教師あり強度イベントステレオマッチング

イベントカメラは、高いダイナミックレンジと低消費電力でマイクロ秒の精度でピクセルレベルの強度変化を出力する、バイオに着想を得た新しいビジョンセンサーです。これらの利点にもかかわらず、高品質の強度とイベントを同時に取得できないため、イベントカメラを計算イメージングタスクに直接適用することはできません。このホワイトペーパーでは、アプリケーションが 2 つのセンサーの両方を利用できるように、スタンドアロンのイベントカメラと最新の強度カメラを接続することを目的としています。マルチモーダルステレオマッチングタスクを通じて、この接続を確立します。最初にイベントを再構成画像に変換し、既存のステレオネットワークをこのマルチモダリティ条件に拡張します。グラウンドトゥルース視差データを使用せずにマルチモーダルステレオネットワークをトレーニングするための自己教師あり方法を提案します。画像勾配で計算された構造損失は、このようなマルチモーダルデータの自己教師あり学習を有効にするために使用されます。異なるモダリティを持つビュー間の内部ステレオ制約を利用して、視差相互整合性損失や内部視差損失などの一般的なステレオ損失関数を導入し、既存のアプローチと比較してパフォーマンスとロバスト性を向上させます。実験は、合成データセットと実際のデータセットの両方で、提案された方法、特に提案された一般的なステレオ損失関数の有効性を示しています。最後に、ビデオ補間アプリケーションなどのダウンストリームタスクで整列されたイベントと強度イメージを使用することに光を当てます。

Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.

updated: Tue Nov 01 2022 14:52:25 GMT+0000 (UTC)

published: Tue Nov 01 2022 14:52:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト