An Integral Projection-based Semantic Autoencoder for Zero-Shot Learning

William Heyden; Habib Ullah; M. Salman Siddiqui; Fadi Al Machot

ゼロショット学習のための統合投影ベースのセマンティックオートエンコーダー

ゼロショット学習 (ZSL) 分類は、トレーニングセットに含まれていないクラス (未確認のクラス) を分類または予測します。最近の研究では、エンコーダが視覚特徴ベクトル空間を意味空間に埋め込み、デコーダが元の視覚特徴空間を再構成する、さまざまなセマンティックオートエンコーダ (SAE) モデルが提案されています。目的は、ソースデータ分布を活用して埋め込みを学習することです。これは、異なるが関連するターゲットデータ分布に効果的に適用できます。このような埋め込みベースの方法は、ドメインシフトの問題が発生しやすく、バイアスに対して脆弱です。我々は、エンコーダが意味空間と連結された視覚特徴空間を潜在表現空間に投影する統合投影ベースの意味自動エンコーダ (IP-SAE) を提案します。デコーダに視覚的意味論的なデータ空間を再構築させます。この制約により、視覚意味射影関数は元の視覚特徴空間内に含まれる識別データを保存します。強化された投影により、領域多様体に対して不変の視覚特徴空間のより正確な再構成が強制されます。その結果、学習された射影関数はドメイン固有性が低くなり、ドメインシフトの問題が軽減されます。私たちが提案する IP-SAE モデルは、埋め込みと射影のための対称変換関数を統合するため、ZSL での生成アプリケーションの解釈に透明性を提供します。したがって、4 つのベンチマークデータセットを考慮した最先端の手法を上回るパフォーマンスを発揮することに加えて、私たちの分析アプローチにより、ゼロショット推論の独自のコンテキストで生成ベースの手法の明確な特性を調査することができます。

Zero-shot Learning (ZSL) classification categorizes or predicts classes (labels) that are not included in the training set (unseen classes). Recent works proposed different semantic autoencoder (SAE) models where the encoder embeds a visual feature vector space into the semantic space and the decoder reconstructs the original visual feature space. The objective is to learn the embedding by leveraging a source data distribution, which can be applied effectively to a different but related target data distribution. Such embedding-based methods are prone to domain shift problems and are vulnerable to biases. We propose an integral projection-based semantic autoencoder (IP-SAE) where an encoder projects a visual feature space concatenated with the semantic space into a latent representation space. We force the decoder to reconstruct the visual-semantic data space. Due to this constraint, the visual-semantic projection function preserves the discriminatory data included inside the original visual feature space. The enriched projection forces a more precise reconstitution of the visual feature space invariant to the domain manifold. Consequently, the learned projection function is less domain-specific and alleviates the domain shift problem. Our proposed IP-SAE model consolidates a symmetric transformation function for embedding and projection, and thus, it provides transparency for interpreting generative applications in ZSL. Therefore, in addition to outperforming state-of-the-art methods considering four benchmark datasets, our analytical approach allows us to investigate distinct characteristics of generative-based methods in the unique context of zero-shot inference.

updated: Fri Aug 11 2023 10:17:04 GMT+0000 (UTC)

published: Mon Jun 26 2023 12:06:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト