S-JEA: Stacked Joint Embedding Architectures for Self-Supervised Visual Representation Learning

Alžběta Manová; Aiden Durrant; Georgios Leontidis

S-JEA: 自己教師あり視覚表現学習のためのスタック型結合埋め込みアーキテクチャ

画像表現を学習するための基本的なパラダイムとしての自己教師あり学習 (SSL) の最近の出現は、さまざまなタスクで高い経験的成功を示しており、今後も実証され続けています。ただし、ほとんどの SSL アプローチは、分離可能で解釈可能な階層的なセマンティック概念を捕捉する埋め込みを学習できません。この研究では、上位レベルの JEA が下位レベルの JEA の表現を入力とするジョイントエンベディングアーキテクチャ (JEA) を積み重ねることにより、高度に分離可能な意味階層表現を学習することを目的としています。これにより、上位レベルの JEA で意味概念 (車両のモデルや色など) の異なるサブカテゴリを示す表現空間が得られます。スタックされた JEA の表現が比較パラメータ数を使用して従来の JEA と同様のレベルで実行されることを経験的に示し、表現空間を視覚化して意味階層を検証します。

The recent emergence of Self-Supervised Learning (SSL) as a fundamental paradigm for learning image representations has, and continues to, demonstrate high empirical success in a variety of tasks. However, most SSL approaches fail to learn embeddings that capture hierarchical semantic concepts that are separable and interpretable. In this work, we aim to learn highly separable semantic hierarchical representations by stacking Joint Embedding Architectures (JEA) where higher-level JEAs are input with representations of lower-level JEA. This results in a representation space that exhibits distinct sub-categories of semantic concepts (e.g., model and colour of vehicles) in higher-level JEAs. We empirically show that representations from stacked JEA perform on a similar level as traditional JEA with comparative parameter counts and visualise the representation spaces to validate the semantic hierarchies.

updated: Mon Nov 04 2024 19:04:37 GMT+0000 (UTC)

published: Fri May 19 2023 14:25:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト