From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration

Zekun Qian; Ruize Han; Wei Feng; Feifan Wang; Song Wang

鳥瞰図から見る: カメラキャリブレーションなしのジョイントカメラと被写体登録

事前に与えられたカメラのキャリブレーションなしで、多視点カメラと鳥瞰図 (BEV) での被写体登録の新しい問題に取り組みます。これは非常に困難な問題です。その唯一の入力は、BEV 画像と FPV のキャリブレーションなしで、複数の人物のシーンのさまざまな一人称ビュー (FPV) からの複数の RGB 画像であり、出力はBEV 内の被写体とカメラの両方のローカリゼーションと向き。この問題を解決するエンドツーエンドのフレームワークを提案します。その主なアイデアは次の部分に分けることができます。 ) カメラのローカリゼーションとビューの方向、つまり、統合された BEV でのカメラの登録を推定するための幾何学的変換ベースの方法を導出する。評価のための豊富な注釈を含む新しい大規模な合成データセットを収集します。実験結果は、提案した方法の顕著な有効性を示している。

We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene, without the BEV image and the calibration of the FPVs, while the output is a unified plane with the localization and orientation of both the subjects and cameras in a BEV. We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts: i) creating a view-transform subject detection module to transform the FPV to a virtual BEV including localization and orientation of each pedestrian, ii) deriving a geometric transformation based method to estimate camera localization and view direction, i.e., the camera registration in a unified BEV, iii) making use of spatial and appearance information to aggregate the subjects into the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for evaluation. The experimental results show the remarkable effectiveness of our proposed method.

updated: Sun Apr 28 2024 05:23:35 GMT+0000 (UTC)

published: Mon Dec 19 2022 08:31:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト