Informative Data Selection with Uncertainty for Multi-modal Object Detection

Xinyu Zhang; Zhiwei Li; Zhenhong Zou; Xin Gao; Yijin Xiong; Dafeng Jin; Jun Li; Huaping Liu

マルチモーダルオブジェクト検出のための不確実性を伴う有益なデータ選択

ノイズは、モデルの推論に混乱をもたらし、それによってデータの有益性を低下させるため、物体検出において常に無視できない問題でした。観察されたパターンのシフトにより不正確な認識につながる可能性があり、モデルの堅牢な一般化が必要です。一般的なビジョンモデルを実装するには、マルチモーダルデータから有効な情報を適応的に選択できるディープラーニングモデルを開発する必要があります。これは主に2つの理由に基づいています。マルチモーダル学習は、シングルモーダルデータに固有の欠陥を打ち破り、適応型情報選択により、マルチモーダルデータの混乱を軽減できます。この問題に取り組むために、普遍的な不確実性を認識するマルチモーダル融合モデルを提案します。マルチパイプラインの疎結合アーキテクチャを採用して、点群と画像の機能と結果を結合します。マルチモーダル情報の相関を定量化するために、さまざまなモダリティでデータ情報の逆数として不確実性をモデル化し、それを境界ボックスの生成に埋め込みます。このようにして、私たちのモデルは融合のランダム性を減らし、信頼できる出力を生成します。さらに、KITTI 2D オブジェクト検出データセットとその派生ダーティデータについて完全な調査を実施しました。当社のフュージョンモデルは、ガウシアン、モーションブラー、フロストなどの深刻なノイズ干渉に耐えることが証明されており、劣化はわずかです。実験結果は、適応融合の利点を示しています。マルチモーダル融合の堅牢性に関する私たちの分析は、将来の研究のためのさらなる洞察を提供します。

Noise has always been nonnegligible trouble in object detection by creating confusion in model reasoning, thereby reducing the informativeness of the data. It can lead to inaccurate recognition due to the shift in the observed pattern, that requires a robust generalization of the models. To implement a general vision model, we need to develop deep learning models that can adaptively select valid information from multi-modal data. This is mainly based on two reasons. Multi-modal learning can break through the inherent defects of single-modal data, and adaptive information selection can reduce chaos in multi-modal data. To tackle this problem, we propose a universal uncertainty-aware multi-modal fusion model. It adopts a multi-pipeline loosely coupled architecture to combine the features and results from point clouds and images. To quantify the correlation in multi-modal information, we model the uncertainty, as the inverse of data information, in different modalities and embed it in the bounding box generation. In this way, our model reduces the randomness in fusion and generates reliable output. Moreover, we conducted a completed investigation on the KITTI 2D object detection dataset and its derived dirty data. Our fusion model is proven to resist severe noise interference like Gaussian, motion blur, and frost, with only slight degradation. The experiment results demonstrate the benefits of our adaptive fusion. Our analysis on the robustness of multi-modal fusion will provide further insights for future research.

updated: Sun Apr 23 2023 16:36:13 GMT+0000 (UTC)

published: Sun Apr 23 2023 16:36:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト