Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

Haisong Liu; Tao Lu; Yihui Xu; Jia Liu; Limin Wang

双方向カメラ-LiDAR フュージョンによるオプティカルフローとシーンフローの学習

この論文では、同期された 2D および 3D データからオプティカルフローとシーンフローを共同で推定する問題を研究します。以前の方法では、ジョイントタスクを独立したステージに分割する複雑なパイプラインを使用するか、「早期融合」または「後期融合」の方法で 2D と 3D の情報を融合させていました。このような画一的なアプローチは、各モダリティの特性を十分に活用できない、またはモダリティ間の補完性を最大化できないというジレンマに悩まされています。この問題に対処するために、特定のレイヤーでそれらの間に複数の双方向融合接続を持つ2Dおよび3Dブランチで構成される、新しいエンドツーエンドのフレームワークを提案します。以前の作業とは異なり、ポイントベースの 3D ブランチを適用して LiDAR フィーチャを抽出します。これは、ポイントクラウドの幾何学的構造を保持するためです。密な画像特徴とまばらな点特徴を融合するために、双方向カメラ-LiDAR 融合モジュール (Bi-CLFM) という名前の学習可能な演算子を提案します。 2 種類の双方向フュージョンパイプラインをインスタンス化します。1 つはピラミッド型の粗密アーキテクチャ (CamLiPWC と呼ばれます) に基づいており、もう 1 つは再帰的な全ペアフィールド変換 (CamLiRAFT と呼ばれます) に基づいています。 FlyingThings3D では、CamLiPWC と CamLiRAFT の両方が既存のすべての方法を上回り、公開された最良の結果から 3D エンドポイントエラーを最大 47.9% 削減しました。最高のパフォーマンスを発揮するモデルである CamLiRAFT は、KITTI Scene Flow ベンチマークで 4.26% のエラーを達成し、はるかに少ないパラメーターですべての提出物の中で 1 位にランクされました。その上、私たちの方法は強力な汎化性能と非剛体運動を処理する能力を備えています。コードは https://github.com/MCG-NJU/CamLiFlow で入手できます。

In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion. Code is available at https://github.com/MCG-NJU/CamLiFlow.

updated: Mon Apr 08 2024 09:02:40 GMT+0000 (UTC)

published: Tue Mar 21 2023 16:54:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト