Fully Differentiable and Interpretable Model for VIO with 4 Trainable Parameters

Zexi Chen; Haozhe Du; Yiyi Liao; Yue Wang; Rong Xiong

4つのトレーニング可能なパラメータを備えたVIOの完全に微分可能で解釈可能なモデル

単眼視覚慣性走行距離測定（VIO）は、ロボット工学と自動運転における重大な問題です。従来の方法は、フィルタリングまたは最適化に基づいてこの問題を解決します。完全に解釈可能ですが、手動の干渉と経験的なパラメータ調整に依存しています。一方、学習ベースのアプローチでは、エンドツーエンドのトレーニングが可能ですが、数百万のパラメーターを学習するには、多数のトレーニングデータが必要です。ただし、解釈不可能で重いモデルは、一般化能力を妨げます。この論文では、4つのトレーニング可能なパラメータのみを含む、完全に微分可能で、解釈可能で、軽量の単眼VIOモデルを提案します。具体的には、最初に、ピッチとロールを予測するための微分可能層として無香カルマンフィルターを採用します。ここで、ノイズの共分散行列を学習して、IMU生データのノイズをフィルターで除去します。次に、洗練されたピッチとロールを採用して、微分可能なカメラ投影を使用して各フレームの重力整列BEV画像を取得します。最後に、微分可能なポーズ推定器を使用して、BEVフレーム間の残りの4つのDoFポーズを推定します。私たちの方法は、ポーズ推定損失によって監視される共分散行列をエンドツーエンドで学習することを可能にし、経験的なベースラインよりも優れたパフォーマンスを示します。合成データセットと実世界のデータセットに関する実験結果は、私たちの単純なアプローチが最先端の方法と競合し、目に見えないシーンでうまく一般化することを示しています。

Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, interpretable, and lightweight monocular VIO model that contains only 4 trainable parameters. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data. Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 4 DoF poses between the BEV frames. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.

updated: Sat Sep 25 2021 06:54:09 GMT+0000 (UTC)

published: Sat Sep 25 2021 06:54:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト