Large Scale Joint Semantic Re-Localisation and Scene Understanding via   Globally Unique Instance Coordinate Regression

Ignas Budvytis; Marvin Teichmann; Tomas Vojir; Roberto Cipolla

グローバルに一意なインスタンス座標回帰による大規模な共同セマンティック再ローカリゼーションとシーン理解

Large Scale Joint Semantic Re-Localisation and Scene Understanding via Globally Unique Instance Coordinate Regression

この作業では、セマンティックローカリゼーションとシーンの理解を結合する新しいアプローチを示します。私たちの仕事は、6 DoFカメラポーズを予測するだけでなく、周囲のオブジェクトを同時に認識し、3Dジオメトリを推定するローカリゼーションアルゴリズムの必要性に動機付けられています。このような機能は、環境と相互作用するコンピュータービジョンガイドシステム（自動運転、拡張現実、ロボット工学）に不可欠です。特に、2段階の手順を提案します。最初のステップでは、ピクセルごとのグローバルに一意なインスタンスラベルと、静的オブジェクト（建物など）の各インスタンスに対応するローカル座標を共同で予測するために、畳み込みニューラルネットワークをトレーニングします。 2番目のステップでは、オブジェクトの中心座標とローカル座標を組み合わせてシーン座標を取得し、それらを使用して6 DoFカメラポーズ推定を実行します。現実世界（CamVid-360）および人工（SceneCity）の自動運転データセットに対するアプローチを評価します。ダイレクトポーズ回帰およびすべてのデータセットのシーン座標からのポーズ推定に基づく最先端の6-DoFポーズ推定アルゴリズムよりも小さい平均距離と角度誤差を取得します。（i）オブジェクトインスタンス認識とローカル座標回帰の2つの個別のタスクとしてのシーン座標回帰の新しい定式化と、提案されたソリューションが静的オブジェクトの正確な3Dジオメトリを予測し、カメラの6 DoFポーズを推定できることを示すデモ（ii）シーン座標回帰法で以前に試行されたよりも数桁大きいマップ、および（iii）建物に配置された直方体などの3Dプリミティブから作成された軽量の近似3Dマップ。

In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving, augmented reality and robotics. In particular, we propose a two step procedure. During the first step we train a convolutional neural network to jointly predict per-pixel globally unique instance labels and corresponding local coordinates for each instance of a static object (e.g. a building). During the second step we obtain scene coordinates by combining object center coordinates and local coordinates and use them to perform 6-DoF camera pose estimation. We evaluate our approach on real world (CamVid-360) and artificial (SceneCity) autonomous driving datasets. We obtain smaller mean distance and angular errors than state-of-the-art 6-DoF pose estimation algorithms based on direct pose regression and pose estimation from scene coordinates on all datasets. Our contributions include: (i) a novel formulation of scene coordinate regression as two separate tasks of object instance recognition and local coordinate regression and a demonstration that our proposed solution allows to predict accurate 3D geometry of static objects and estimate 6-DoF pose of camera on (ii) maps larger by several orders of magnitude than previously attempted by scene coordinate regression methods, as well as on (iii) lightweight, approximate 3D maps built from 3D primitives such as building-aligned cuboids.

updated: Mon Sep 23 2019 09:26:27 GMT+0000 (UTC)

published: Mon Sep 23 2019 09:26:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト