Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning

Man Liu; Feng Li; Chunjie Zhang; Yunchao Wei; Huihui Bai; Yao Zhao

一般化されたゼロショット学習のための漸進的意味視覚相互適応

Generalized Zero-Shot Learning (GZSL) は、視覚情報と意味情報の間の固有の相互作用に依存して、目に見えるドメインから転送された知識によって目に見えないカテゴリを識別します。これまでの研究では、主に共有属性に対応する領域をローカライズしていました。さまざまな視覚的外観が同じ属性に対応する場合、共有属性は必然的に意味のあいまいさを導入し、正確な意味と視覚の相互作用の調査を妨げます。このホワイトペーパーでは、デュアルセマンティックビジュアルトランスフォーマーモジュール (DSVTM) を展開して、属性プロトタイプと視覚的特徴の間の対応を漸進的にモデル化し、セマンティックの曖昧さの解消と知識の伝達性向上のためのプログレッシブセマンティックビジュアル相互適応 (PSVMA) ネットワークを構成します。具体的には、DSVTM は、インスタンス中心のプロトタイプを学習してさまざまな画像に適応させるインスタンス駆動型セマンティックエンコーダーを考案し、一致しないセマンティックとビジュアルのペアを一致するものに再キャストできるようにします。次に、セマンティックに動機付けられたインスタンスデコーダーは、セマンティックに関連するインスタンスの適応のために、一致したペア間の正確なクロスドメイン相互作用を強化し、明確な視覚的表現の生成を促進します。さらに、GZSL で見られるクラスへの偏りを軽減するために、見られる予測と見られない予測の間の応答の一貫性を追求するために、偏りをなくす損失が提案されています。 PSVMA は、他の最先端の方法よりも一貫して優れたパフォーマンスを発揮します。コードは https://github.com/ManLiuCoder/PSVMA で入手できます。

Generalized Zero-Shot Learning (GZSL) identifies unseen categories by knowledge transferred from the seen domain, relying on the intrinsic interactions between visual and semantic information. Prior works mainly localize regions corresponding to the sharing attributes. When various visual appearances correspond to the same attribute, the sharing attributes inevitably introduce semantic ambiguity, hampering the exploration of accurate semantic-visual interactions. In this paper, we deploy the dual semantic-visual transformer module (DSVTM) to progressively model the correspondences between attribute prototypes and visual features, constituting a progressive semantic-visual mutual adaption (PSVMA) network for semantic disambiguation and knowledge transferability improvement. Specifically, DSVTM devises an instance-motivated semantic encoder that learns instance-centric prototypes to adapt to different images, enabling the recast of the unmatched semantic-visual pair into the matched one. Then, a semantic-motivated instance decoder strengthens accurate cross-domain interactions between the matched pair for semantic-related instance adaption, encouraging the generation of unambiguous visual representations. Moreover, to mitigate the bias towards seen classes in GZSL, a debiasing loss is proposed to pursue response consistency between seen and unseen predictions. The PSVMA consistently yields superior performances against other state-of-the-art methods. Code will be available at: https://github.com/ManLiuCoder/PSVMA.

updated: Mon Mar 27 2023 15:21:43 GMT+0000 (UTC)

published: Mon Mar 27 2023 15:21:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト