TransZero: Attribute-guided Transformer for Zero-Shot Learning

Shiming Chen; Ziming Hong; Yang Liu; Guo-Sen Xie; Baigui Sun; Hao Li; Qinmu Peng; Ke Lu; Xinge You

TransZero：ゼロショット学習用の属性ガイド付きトランスフォーマー

ゼロショット学習（ZSL）は、セマンティック知識を表示されているクラスから表示されていないクラスに転送することにより、新しいクラスを認識することを目的としています。セマンティックの知識は、異なるクラス間で共有される属性の説明から学習されます。これは、識別可能な領域の特徴を表すオブジェクト属性をローカライズするための強力な事前確率として機能し、視覚とセマンティックの重要な相互作用を可能にします。一部の注意ベースのモデルは、単一の画像でそのような領域の特徴を学習しようとしましたが、視覚的特徴の転送可能性と識別属性のローカリゼーションは通常無視されます。この論文では、TransZeroと呼ばれる属性誘導トランスフォーマーネットワークを提案し、視覚的特徴を洗練し、ZSLでの識別可能な視覚的埋め込み表現の属性ローカリゼーションを学習します。具体的には、TransZeroは、ImageNetベンチマークとZSLベンチマーク間のデータセット間のバイアスを軽減する機能拡張エンコーダーを採用し、領域機能間の絡み合った相対的なジオメトリ関係を減らすことで、視覚機能の転送可能性を向上させます。局所性が増強された視覚的特徴を学習するために、TransZeroは、視覚的セマンティックデコーダーを使用して、セマンティック属性情報のガイダンスの下で、特定の画像の各属性に最も関連する画像領域をローカライズします。次に、局所性が増強された視覚的特徴および意味ベクトルを使用して、視覚的意味的埋め込みネットワークにおいて効果的な視覚的意味的相互作用を実施する。広範な実験により、TransZeroは3つのZSLベンチマークで新しい最先端を達成していることが示されています。コードはhttps://github.com/shiming-chen/TransZeroで入手できます。

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant visual-semantic interaction. Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we propose an attribute-guided Transformer network, termed TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations in ZSL. Specifically, TransZero takes a feature augmentation encoder to alleviate the cross-dataset bias between ImageNet and ZSL benchmarks, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. To learn locality-augmented visual features, TransZero employs a visual-semantic decoder to localize the image regions most relevant to each attribute in a given image, under the guidance of semantic attribute information. Then, the locality-augmented visual features and semantic vectors are used to conduct effective visual-semantic interaction in a visual-semantic embedding network. Extensive experiments show that TransZero achieves the new state of the art on three ZSL benchmarks. The codes are available at: https://github.com/shiming-chen/TransZero.

updated: Fri Dec 03 2021 02:39:59 GMT+0000 (UTC)

published: Fri Dec 03 2021 02:39:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト