FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration

Zhijian Huang; Sihao Lin; Guiyu Liu; Mukun Luo; Chaoqiang Ye; Hang Xu; Xiaojun Chang; Xiaodan Liang

FULLER: マルチレベル勾配キャリブレーションによる統合マルチモダリティマルチタスク 3D 知覚

堅牢な予測と計算予算を考慮すると、マルチモダリティの融合とマルチタスク学習が 3D 自動運転シナリオのトレンドになりつつあります。しかし、既存のフレームワークを単純にマルチモダリティのマルチタスク学習の領域に拡張することは、悪名高いモダリティのバイアスとタスクの競合により、依然として非効果的であり、有害ですらあります。これまでの研究では、学習フレームワークを経験的知識と手動で調整していたため、最適化が図れない可能性がありました。この問題を軽減するために、最適化中のタスクとモダリティにわたる、新規でありながらシンプルなマルチレベル勾配キャリブレーション学習フレームワークを提案します。具体的には、タスクヘッドによって生成され、共有バックボーンの更新に使用される勾配は、タスクの競合を軽減するためにバックボーンの最後の層で調整されます。校正された勾配がバックボーンのモダリティブランチにさらに伝播される前に、その大きさが同じレベルに再度校正され、下流のタスクがさまざまなモダリティにバランスのとれた注意を払うようになります。大規模なベンチマーク nuScenes での実験では、提案された方法の有効性が実証されています。たとえば、マップセグメンテーションで mIoU が絶対的に 14.4% 向上し、3D 検出で mAP が 1.4% 向上し、マルチモダリティフュージョンの領域で 3D 自動運転の応用が前進します。そしてマルチタスク学習。また、モダリティとタスクの関係についても説明します。

Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario, considering robust prediction and computation budget. However, naively extending the existing framework to the domain of multi-modality multi-task learning remains ineffective and even poisonous due to the notorious modality bias and task conflict. Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima. To mitigate the issue, we propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization. Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict. Before the calibrated gradients are further propagated to the modality branches of the backbone, their magnitudes will be calibrated again to the same level, ensuring the downstream tasks pay balanced attention to different modalities. Experiments on large-scale benchmark nuScenes demonstrate the effectiveness of the proposed method, eg, an absolute 14.4% mIoU improvement on map segmentation and 1.4% mAP improvement on 3D detection, advancing the application of 3D autonomous driving in the domain of multi-modality fusion and multi-task learning. We also discuss the links between modalities and tasks.

updated: Mon Jul 31 2023 12:50:15 GMT+0000 (UTC)

published: Mon Jul 31 2023 12:50:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト