Scene Retrieval for Contextual Visual Mapping

William H. B. Smith; Michael Milford; Klaus D. McDonald-Maier; Shoaib Ehsan

コンテキストビジュアルマッピングのためのシーン検索

ビジュアルナビゲーションは、「ビジュアルマップ」とも呼ばれる場所画像の参照データベースに対してクエリ場所画像をローカライズします。ビジュアルマップの特定の領域である「シーンクラス」のローカリゼーション精度要件は、環境とタスクのコンテキストによって異なります。最先端のビジュアルマッピングでは、マップに含めるシーンクラスを明示的にターゲットにすることで、これらの要件を反映することはできません。横断歩道や駅など、4つの異なるシーンクラスが、ノードランドとセントルシアの各データセットで識別されています。これらの重複するシーンクラスと格闘する個別のシーン分類子を再トレーニングする代わりに、最初の貢献をします。それは、「シーン検索」の問題を定義することです。シーン検索は、単一のクエリ画像をシーンクラスの参照画像に関連付けることにより、画像検索をテスト時に定義されたシーンの分類に拡張します。 2番目の貢献は、この問題に対処するためのトリプレットトレーニング済み畳み込みニューラルネットワーク（CNN）です。これにより、シーン認識用に事前トレーニングされた最先端のネットワークに対して、シーン分類の精度が最大7％向上します。 2番目の貢献は、シーン分類と距離および視覚的マッピングの記憶力を組み合わせたアルゴリズム「DMC」です。私たちの分析によると、DMCは、距離間隔マッピングを使用するよりも、選択したシーンクラスの画像をビジュアルマップに64％多く含んでいます。最先端の視覚的場所記述子AMOS-Net、Hybrid-Net、およびNetVLADが最終的に使用され、DMCがシーンクラスのローカリゼーション精度を平均3％向上させ、残りのマップ画像のローカリゼーション精度を平均10％向上させることを示しています。両方のデータセット全体の％。

Visual navigation localizes a query place image against a reference database of place images, also known as a `visual map'. Localization accuracy requirements for specific areas of the visual map, `scene classes', vary according to the context of the environment and task. State-of-the-art visual mapping is unable to reflect these requirements by explicitly targetting scene classes for inclusion in the map. Four different scene classes, including pedestrian crossings and stations, are identified in each of the Nordland and St. Lucia datasets. Instead of re-training separate scene classifiers which struggle with these overlapping scene classes we make our first contribution: defining the problem of `scene retrieval'. Scene retrieval extends image retrieval to classification of scenes defined at test time by associating a single query image to reference images of scene classes. Our second contribution is a triplet-trained convolutional neural network (CNN) to address this problem which increases scene classification accuracy by up to 7% against state-of-the-art networks pre-trained for scene recognition. The second contribution is an algorithm `DMC' that combines our scene classification with distance and memorability for visual mapping. Our analysis shows that DMC includes 64% more images of our chosen scene classes in a visual map than just using distance interval mapping. State-of-the-art visual place descriptors AMOS-Net, Hybrid-Net and NetVLAD are finally used to show that DMC improves scene class localization accuracy by a mean of 3% and localization accuracy of the remaining map images by a mean of 10% across both datasets.

updated: Thu Feb 25 2021 08:23:19 GMT+0000 (UTC)

published: Thu Feb 25 2021 08:23:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト