Vision-Based Object Recognition in Indoor Environments Using Topologically Persistent Features

Ekta U. Samani; Xingjian Yang; Ashis G. Banerjee

トポロジー的に永続的な機能を使用した屋内環境での視覚ベースのオブジェクト認識

目に見えない屋内環境での物体認識は、移動ロボットの視覚認識にとって依然として困難な問題です。この手紙では、この課題に対処するために、オブジェクトの形状情報に依存するトポロジ的に永続的な機能の使用を提案します。特に、オブジェクトセグメンテーションマップを表す3次複体の多方向高さ関数ベースのフィルタリングに永続的なホモロジーを適用することにより、2種類の特徴、つまりスパース永続性画像（PI）と振幅を抽出します。次に、これらの機能を使用して、認識のために完全に接続されたネットワークをトレーニングします。パフォーマンス評価のために、広く使用されている形状データセットに加えて、リビングルームと模擬倉庫の2つの異なる環境からのシーン画像を含む新しいデータセットを収集します。両方の環境のシーンには、14個のオブジェクトの特定のセットから選択された最大5個の異なるオブジェクトが含まれています。オブジェクトにはさまざまなポーズと配置があり、さまざまな照明条件とカメラポーズで画像化されます。居間の画像を使用してトレーニングされたメソッドの認識パフォーマンスは、目に見えない倉庫の画像には比較的影響を受けません。対照的に、広く使用されているFasterR-CNNとその最先端のバリアントであるDomainAdaptive FasterR-CNNのパフォーマンスは大幅に低下します。さらに、スパースPI機能を使用すると、目に見えない倉庫環境で全体的な再現率と適合率がわずかに高くなります。また、提案手法を実世界のロボットに実装し、その有用性を実証します。

Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots. In this letter, we propose the use of topologically persistent features, which rely on the shape information of the objects, to address this challenge. In particular, we extract two kinds of features, namely, sparse persistence image (PI) and amplitude, by applying persistent homology to multi-directional height function-based filtrations of the cubical complexes representing the object segmentation maps. The features are then used to train a fully connected network for recognition. For performance evaluation, in addition to a widely used shape dataset, we collect a new dataset comprising scene images from two different environments, namely, a living room and a mock warehouse. The scenes in both the environments include up to five different objects that are chosen from a given set of fourteen objects. The objects have varying poses and arrangements, and are imaged under different illumination conditions and camera poses. The recognition performance of our methods, which are trained using the living room images, remains relatively unaffected on the unseen warehouse images. In contrast, the performance of the widely used Faster R-CNN and its state-of-the-art variant, Domain Adaptive Faster R-CNN, drops significantly. Moreover, the use of sparse PI features yields slightly higher overall recall and accuracy in the unseen warehouse environment. We also implement the proposed method on a real-world robot to demonstrate its usefulness.

updated: Mon Mar 08 2021 19:21:48 GMT+0000 (UTC)

published: Wed Oct 07 2020 06:04:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト