Visual Object Recognition in Indoor Environments Using Topologically Persistent Features

Ekta U. Samani; Xingjian Yang; Ashis G. Banerjee

トポロジー的に永続的な特徴を使用した屋内環境での視覚的物体認識

目に見えない屋内環境での物体認識は、移動ロボットの視覚認識にとって依然として困難な問題です。この手紙では、この課題に対処するために、オブジェクトの形状情報に依存するトポロジ的に永続的な機能の使用を提案します。特に、オブジェクトセグメンテーションマップを表す3次複体の多方向高さ関数ベースのフィルタリングに永続的なホモロジーを適用することにより、2種類の特徴、つまりスパース永続性画像（PI）と振幅を抽出します。次に、これらの機能を使用して、認識のために完全に接続されたネットワークをトレーニングします。パフォーマンス評価のために、広く使用されている形状データセットとベンチマーク屋内シーンデータセットに加えて、リビングルームと模擬倉庫の2つの異なる環境からのシーン画像で構成される新しいデータセットを収集します。シーンは、さまざまな照明条件下でさまざまなカメラポーズを使用してキャプチャされ、14個のオブジェクトの特定のセットから最大5個の異なるオブジェクトが含まれます。ベンチマークの屋内シーンデータセットでは、スパースPI機能は、広く使用されているResNetV2-56およびEfficientNet-B4モデルを使用して学習した機能よりも、目に見えない環境で優れた認識パフォーマンスを示します。さらに、これらは、エンドツーエンドのオブジェクト検出方法であるFaster R-CNN、およびその最先端のバリアントであるDomain Adaptive FasterR-CNNよりもわずかに高い再現率と精度の値を提供します。新しいデータセットのトレーニング環境（リビングルーム）から見えない環境（模擬倉庫）まで、メソッドのパフォーマンスも比較的変化していません。対照的に、オブジェクト検出方法のパフォーマンスは大幅に低下します。また、提案手法を実世界のロボットに実装し、その有用性を実証します。

Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots. In this letter, we propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge. In particular, we extract two kinds of features, namely, sparse persistence image (PI) and amplitude, by applying persistent homology to multi-directional height function-based filtrations of the cubical complexes representing the object segmentation maps. The features are then used to train a fully connected network for recognition. For performance evaluation, in addition to a widely used shape dataset and a benchmark indoor scenes dataset, we collect a new dataset, comprising scene images from two different environments, namely, a living room and a mock warehouse. The scenes are captured using varying camera poses under different illumination conditions and include up to five different objects from a given set of fourteen objects. On the benchmark indoor scenes dataset, sparse PI features show better recognition performance in unseen environments than the features learned using the widely used ResNetV2-56 and EfficientNet-B4 models. Further, they provide slightly higher recall and accuracy values than Faster R-CNN, an end-to-end object detection method, and its state-of-the-art variant, Domain Adaptive Faster R-CNN. The performance of our methods also remains relatively unchanged from the training environment (living room) to the unseen environment (mock warehouse) in the new dataset. In contrast, the performance of the object detection methods drops substantially. We also implement the proposed method on a real-world robot to demonstrate its usefulness.

updated: Thu May 20 2021 17:24:47 GMT+0000 (UTC)

published: Wed Oct 07 2020 06:04:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト