Fine-grained Few-shot Recognition by Deep Object Parsing

Pengkai Zhu; Ruizhao Zhu; Samarth Mishra; Venkatesh Saligrama

ディープオブジェクト解析によるきめ細かい数ショット認識

このフレームワークでは、オブジェクトはK個の個別のパーツまたはユニットで構成されており、K個のパーツを推測してテストインスタンスを解析します。各パーツはフィーチャスペース内の個別の場所を占め、この場所のインスタンスフィーチャは次のように表されます。すべてのインスタンスで共有されるパーツテンプレートのアクティブなサブセット。テストインスタンスは、アクティブなテンプレートとパーツの位置の相対的なジオメトリを、提示された数ショットのインスタンスのものと比較することで認識されます。畳み込みバックボーンの上にあるパーツテンプレートを学習するためのエンドツーエンドのトレーニング方法を提案します。向き、ポーズ、サイズなどの視覚的な歪みに対抗するために、マルチスケールテンプレートを学習し、テスト時にこれらのスケール間でインスタンスを解析して照合します。私たちの方法が最先端のものと競争力があり、構文解析のおかげで解釈可能性も享受していることを示します。

In our framework, an object is made up of K distinct parts or units, and we parse a test instance by inferring the K parts, where each part occupies a distinct location in the feature space, and the instance features at this location, manifest as an active subset of part templates shared across all instances. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. We propose an end-to-end training method to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn multi-scale templates, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.

updated: Thu Jul 14 2022 17:59:05 GMT+0000 (UTC)

published: Thu Jul 14 2022 17:59:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト