HFT: Lifting Perspective Representations via Hybrid Feature Transformation

Jiayu Zou; Junrui Xiao; Zheng Zhu; Junjie Huang; Guan Huang; Dalong Du; Xingang Wang

HFT：ハイブリッド機能変換によるパースペクティブ表現のリフティング

自動運転には、意思決定のための正確で詳細なBird's Eye View（BEV）セマンティックセグメンテーションが必要です。これは、高レベルのシーン認識にとって最も困難なタスクの1つです。正面図からBEVへの機能変換は、BEVセマンティックセグメンテーションの中心的なテクノロジーです。既存の作品は、カメラモデルベースの機能変換（CBFT）とカメラモデルフリー機能変換（CFFT）の2つのカテゴリに大別できます。この論文では、CBFTとCFFTの重要な違いを経験的に分析します。前者は、フラットワールドの仮定に基づいてフィーチャを変換します。これにより、地表プレーンの上にある領域の歪みが発生する可能性があります。後者は、幾何学的事前分布がなく、計算に時間がかかるため、セグメンテーションのパフォーマンスが制限されます。 CBFTとCFFTのメリットを享受し、デメリットを回避するために、ハイブリッド機能変換モジュール（HFT）を備えた新しいフレームワークを提案します。具体的には、BEVの屋外シーンのレイアウトを推定するためにHFTによって生成された特徴マップを分離します。さらに、特徴模倣を適用することによってハイブリッド変換を強化するための相互学習スキームを設計します。特に、大規模な実験では、余分なオーバーヘッドがごくわずかであるため、HFTは、最もパフォーマンスの高い既存の方法と比較して、Argoverseデータセットで13.3％、KITTI 3Dオブジェクトデータセットで16.8％の相対的な改善を達成します。コードはhttps://github.com/JiayuZou2020/HFTで入手できます。

Autonomous driving requires accurate and detailed Bird's Eye View (BEV) semantic segmentation for decision making, which is one of the most challenging tasks for high-level scene perception. Feature transformation from frontal view to BEV is the pivotal technology for BEV semantic segmentation. Existing works can be roughly classified into two categories, i.e., Camera model-Based Feature Transformation (CBFT) and Camera model-Free Feature Transformation (CFFT). In this paper, we empirically analyze the vital differences between CBFT and CFFT. The former transforms features based on the flat-world assumption, which may cause distortion of regions lying above the ground plane. The latter is limited in the segmentation performance due to the absence of geometric priors and time-consuming computation. In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT). Specifically, we decouple the feature maps produced by HFT for estimating the layout of outdoor scenes in BEV. Furthermore, we design a mutual learning scheme to augment hybrid transformation by applying feature mimicking. Notably, extensive experiments demonstrate that with negligible extra overhead, HFT achieves a relative improvement of 13.3% on the Argoverse dataset and 16.8% on the KITTI 3D Object datasets compared to the best-performing existing method. The codes are available at https://github.com/JiayuZou2020/HFT.

updated: Mon Apr 11 2022 13:09:54 GMT+0000 (UTC)

published: Mon Apr 11 2022 13:09:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト