Boosting Deep Open World Recognition by Clustering

Dario Fontanel; Fabio Cermelli; Massimiliano Mancini; Samuel Rota Bulò; Elisa Ricci; Barbara Caputo

クラスタリングによるディープオープンワールドの認知度の向上

畳み込みニューラルネットワークはロボットの視覚に大きな進歩をもたらしましたが、その能力は多くの場合、認識される意味概念の数が利用可能なトレーニングセットによって決定されるクローズドワールドシナリオに制限されています。単一のトレーニングセットで現実世界に存在する可能性のあるすべてのセマンティック概念をキャプチャすることは事実上不可能であるため、閉世界仮説を破り、ロボットにオープンワールドで動作する機能を装備する必要があります。このような機能を提供するために、ロボットビジョンシステムは、（i）インスタンスが既知のカテゴリのセットに属していないかどうかを識別し（つまり、オープンセット認識）、（ii）知識を拡張して新しいクラスを時間の経過とともに学習できる必要があります（つまり、インクリメンタル学習）。この作業では、クラス固有の機能のグローバルからローカルへのクラスタリングを実施する新しい損失定式化によって、ディープオープンワールド認識アルゴリズムのパフォーマンスを向上させる方法を示します。特に、最初の損失項、つまりグローバルクラスタリングは、ネットワークにサンプルをそれらが属するクラス重心に近づけるように強制し、2番目の損失項であるローカルクラスタリングは、同じクラスのサンプルが近づくように表現空間を形成します。他のクラスに属する隣人を押しのけながら、表現空間で。さらに、以前の作業のように単一のグローバルしきい値をヒューリスティックに推定するのではなく、クラス固有の拒否しきい値を学習する戦略を提案します。 RGB-DオブジェクトとCore50データセットでの実験は、私たちのアプローチの有効性を示しています。

While convolutional neural networks have brought significant advances in robot vision, their ability is often limited to closed world scenarios, where the number of semantic concepts to be recognized is determined by the available training set. Since it is practically impossible to capture all possible semantic concepts present in the real world in a single training set, we need to break the closed world assumption, equipping our robot with the capability to act in an open world. To provide such ability, a robot vision system should be able to (i) identify whether an instance does not belong to the set of known categories (i.e. open set recognition), and (ii) extend its knowledge to learn new classes over time (i.e. incremental learning). In this work, we show how we can boost the performance of deep open world recognition algorithms by means of a new loss formulation enforcing a global to local clustering of class-specific features. In particular, a first loss term, i.e. global clustering, forces the network to map samples closer to the class centroid they belong to while the second one, local clustering, shapes the representation space in such a way that samples of the same class get closer in the representation space while pushing away neighbours belonging to other classes. Moreover, we propose a strategy to learn class-specific rejection thresholds, instead of heuristically estimating a single global threshold, as in previous works. Experiments on RGB-D Object and Core50 datasets show the effectiveness of our approach.

updated: Mon Nov 30 2020 09:35:37 GMT+0000 (UTC)

published: Mon Apr 20 2020 12:07:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト