Visual Recognition by Request

Chufeng Tang; Lingxi Xie; Xiaopeng Zhang; Xiaolin Hu; Qi Tian

リクエストによる視覚認識

この論文では、視覚認識のための注釈と評価の新しいプロトコルを提示します。従来の設定とは異なり、プロトコルでは、ラベラー/アルゴリズムがすべてのターゲット（オブジェクト、パーツなど）に一度に注釈を付けたり認識したりする必要はありませんが、代わりに多数の認識命令が発生し、アルゴリズムは要求によってターゲットを認識します。このメカニズムは、注釈の負担を軽減する2つの有益な特性をもたらします。つまり、（i）可変粒度：さまざまなシナリオでさまざまなレベルの注釈を付けることができます。特に、オブジェクトパーツは、大きくて明確なインスタンスでのみラベル付けできます。（ii）開いています。 -ドメイン：最小限のコストで新しい概念をデータベースに追加できます。提案された設定に対処するために、知識ベースを維持し、要求に基づいてオンザフライでクエリを構築するクエリベースの視覚認識フレームワークを設計します。 2つの混合注釈付きデータセット、CPPとADE20Kで認識システムを評価し、部分的にラベル付けされたデータから学習し、テキストラベルのみで新しい概念に適応するという有望な能力を示します。

In this paper, we present a novel protocol of annotation and evaluation for visual recognition. Different from traditional settings, the protocol does not require the labeler/algorithm to annotate/recognize all targets (objects, parts, etc.) at once, but instead raises a number of recognition instructions and the algorithm recognizes targets by request. This mechanism brings two beneficial properties to reduce the burden of annotation, namely, (i) variable granularity: different scenarios can have different levels of annotation, in particular, object parts can be labeled only in large and clear instances, (ii) being open-domain: new concepts can be added to the database in minimal costs. To deal with the proposed setting, we maintain a knowledge base and design a query-based visual recognition framework that constructs queries on-the-fly based on the requests. We evaluate the recognition system on two mixed-annotated datasets, CPP and ADE20K, and demonstrate its promising ability of learning from partially labeled data as well as adapting to new concepts with only text labels.

updated: Thu Jul 28 2022 16:55:11 GMT+0000 (UTC)

published: Thu Jul 28 2022 16:55:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト