Intersection Prediction from Single 360° Image via Deep Detection of Possible Direction of Travel

Naoki Sugimoto; Satoshi Ikehata; Kiyoharu Aizawa

可能な進行方向の詳細な検出による単一の360°画像からの交差点予測

シミュレートされた歩行体験にユーザーを引き込むインタラクティブな一人称視点マップであるMovie-Mapは、視聴者の進行方向に応じてシームレスに接続された交差点で区切られた短い360°ビデオセグメントで構成されます。ただし、交差する道路が多数ある広い都市規模のエリアでは、手動による交差点のセグメンテーションには多大な人的努力が必要です。したがって、360°ビデオからの交差点の自動識別は、Movie-Mapをスケールアップするための重要な問題です。本論文では、360°ビデオの個々のフレームから交差点を識別する新しい方法を提案します。 360°画像を入力として標準の二項分類タスクとして交差点の識別を定式化する代わりに、単一の360°画像から8方向に投影された透視画像の可能な移動方向（PDoT）の数に基づいて交差点を識別します。さまざまなタイプの交差点を処理するためにニューラルネットワークによって検出されます。トレーニングと評価のために大規模な360°画像交差識別（iii360）データセットを構築し、学校のキャンパス、ダウンタウン、郊外、チャイナタウンなどのさまざまなエリアから360°ビデオを収集し、PDoTベースの方法で88を達成することを実証しました％精度。これは、直接ナイーブバイナリ分類ベースの方法で達成される精度よりも大幅に優れています。ソースコードと部分的なデータセットは、論文が公開された後、コミュニティで共有されます。

Movie-Map, an interactive first-person-view map that engages the user in a simulated walking experience, comprises short 360° video segments separated by traffic intersections that are seamlessly connected according to the viewer's direction of travel. However, in wide urban-scale areas with numerous intersecting roads, manual intersection segmentation requires significant human effort. Therefore, automatic identification of intersections from 360° videos is an important problem for scaling up Movie-Map. In this paper, we propose a novel method that identifies an intersection from individual frames in 360° videos. Instead of formulating the intersection identification as a standard binary classification task with a 360° image as input, we identify an intersection based on the number of the possible directions of travel (PDoT) in perspective images projected in eight directions from a single 360° image detected by the neural network for handling various types of intersections. We constructed a large-scale 360° Image Intersection Identification (iii360) dataset for training and evaluation where 360° videos were collected from various areas such as school campus, downtown, suburb, and china town and demonstrate that our PDoT-based method achieves 88% accuracy, which is significantly better than that achieved by the direct naive binary classification based method. The source codes and a partial dataset will be shared in the community after the paper is published.

updated: Sun Apr 10 2022 08:53:14 GMT+0000 (UTC)

published: Sun Apr 10 2022 08:53:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト