Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning

Chaoqun Wang; Xuejin Chen; Shaobo Min; Xiaoyan Sun; Houqiang Li

タスクに依存しない知識は、一般化されたゼロショット学習のための転送可能な表現になります

一般化ゼロショット学習（GZSL）は、転送可能な画像表現を学習することにより、新しいカテゴリを認識することを目的としています。既存の方法では、画像表現を対応するセマンティックラベルと位置合わせすることにより、セマンティック位置合わせされた表現を見えないカテゴリに転送できることがわかります。ただし、見られるカテゴリラベルのみによって監視されるため、学習されたセマンティック知識はタスク固有であるため、画像表現は見られるカテゴリに偏ります。この論文では、セマンティックアラインメントとインスタンス識別を介してタスク固有の知識とタスクに依存しない知識を同時に学習する新しいデュアルコントラスト埋め込みネットワーク（DCEN）を提案します。まず、DCENは、タスクラベルを活用して、クロスモーダル対照学習とセマンティック-ビジュアル相補性の調査により、同じセマンティックカテゴリの表現をクラスター化します。次に、DCENは、タスク固有の知識に加えて、同じ画像のさまざまなビューの表現を引き付け、さまざまな画像の表現をはじくことによって、タスクに依存しない知識を導入します。高レベルの見られるカテゴリの監視と比較して、このインスタンス識別の監視は、DCENが低レベルの視覚的知識をキャプチャすることを奨励します。これは、見られるカテゴリへの偏りが少なく、表現の偏りを軽減します。その結果、タスク固有の知識とタスクに依存しない知識が共同でDCENの転送可能な表現を作成し、4つの公開ベンチマークで平均4.1％の改善が得られます。

Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations. Existing methods find that, by aligning image representations with corresponding semantic labels, the semantic-aligned representations can be transferred to unseen categories. However, supervised by only seen category labels, the learned semantic knowledge is highly task-specific, which makes image representations biased towards seen categories. In this paper, we propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge via semantic alignment and instance discrimination. First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity. Besides task-specific knowledge, DCEN then introduces task-independent knowledge by attracting representations of different views of the same image and repelling representations of different images. Compared to high-level seen category supervision, this instance discrimination supervision encourages DCEN to capture low-level visual knowledge, which is less biased toward seen categories and alleviates the representation bias. Consequently, the task-specific and task-independent knowledge jointly make for transferable representations of DCEN, which obtains averaged 4.1% improvement on four public benchmarks.

updated: Mon Apr 05 2021 10:05:48 GMT+0000 (UTC)

published: Mon Apr 05 2021 10:05:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト