Fine-grained Few-shot Recognition by Deep Object Parsing

Ruizhao Zhu; Pengkai Zhu; Samarth Mishra; Venkatesh Saligrama

Deep Object ParsingによるファイングレインFew-shot認識

深いオブジェクト解析による、きめの細かい少数ショット認識のための新しい方法を提案します。私たちのフレームワークでは、オブジェクトは K 個の個別のパーツで構成されており、パーツごとに、すべてのインスタンスとカテゴリで共有されるテンプレートの辞書を学習します。オブジェクトは、これらの K 個のパーツの位置と、パーツの特徴を再構築できる一連のアクティブなテンプレートを推定することによって解析されます。アクティブなテンプレートとパーツ位置の相対的なジオメトリを、提示された少数ショットインスタンスのものと比較することで、テストインスタンスを認識します。私たちの方法は、畳み込みバックボーンの上でパーツテンプレートを学習するためにエンドツーエンドでトレーニング可能です。向き、ポーズ、サイズなどの視覚的な歪みに対処するために、複数のスケールでテンプレートを学習し、テスト時にこれらのスケール全体でインスタンスを解析して照合します。私たちの方法が最新技術に匹敵し、解析のおかげで解釈可能性も享受できることを示します。

We propose a new method for fine-grained few-shot recognition via deep object parsing. In our framework, an object is made up of K distinct parts and for each part, we learn a dictionary of templates, which is shared across all instances and categories. An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. Our method is end-to-end trainable to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn templates at multiple scales, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.

updated: Thu Oct 13 2022 15:12:37 GMT+0000 (UTC)

published: Thu Jul 14 2022 17:59:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト