Prototype-based Embedding Network for Scene Graph Generation

Chaofan Zheng; Xinyu Lyu; Lianli Gao; Bo Dai; Jingkuan Song

シーングラフ生成のためのプロトタイプベースの埋め込みネットワーク

現在のシーングラフ生成 (SGG) メソッドは、コンテキスト情報を調査して、エンティティペア間の関係を予測します。しかし、多数の可能な主語-目的語の組み合わせの多様な視覚的外観のために、各述語カテゴリ内に大きなクラス内変動があります。モデルの潜在空間における異なるクラス間のクラスの類似性。上記の課題により、現在の SGG メソッドは、信頼できる関係予測のための堅牢な機能を取得できません。この論文では、述語のカテゴリ固有のセマンティクスが、課題を軽減するためのセマンティック空間でクラスごとのプロトタイプとして機能できると主張します。最後に、プロトタイプベースの埋め込みネットワーク (PE-Net) を提案します。これは、エンティティ/述語をプロトタイプに沿ったコンパクトで独特な表現でモデル化し、それによって関係認識のための共通の埋め込み空間でエンティティのペアと述語間のマッチングを確立します。さらに、PE-Net がこのようなエンティティ述語マッチングを効率的に学習できるように、Prototype-guided Learning (PL) が導入され、Prototype Regularization (PR) が考案されて、述語のセマンティックオーバーラップによって引き起こされるあいまいなエンティティ述語マッチングが緩和されます。広範な実験により、私たちの方法が SGG で優れた関係認識機能を獲得し、Visual Genome と Open Images データセットの両方で新しい最先端のパフォーマンスを達成することが実証されています。

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category, e.g., "man-eating-pizza, giraffe-eating-leaf", and the severe inter-class similarity between different classes, e.g., "man-holding-plate, man-eating-pizza", in model's latent space. The above challenges prevent current SGG methods from acquiring robust features for reliable relation prediction. In this paper, we claim that the predicate's category-inherent semantics can serve as class-wise prototypes in the semantic space for relieving the challenges. To the end, we propose the Prototype-based Embedding Network (PE-Net), which models entities/predicates with prototype-aligned compact and distinctive representations and thereby establishes matching between entity pairs and predicates in a common embedding space for relation recognition. Moreover, Prototype-guided Learning (PL) is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching caused by the predicate's semantic overlap. Extensive experiments demonstrate that our method gains superior relation recognition capability on SGG, achieving new state-of-the-art performances on both Visual Genome and Open Images datasets.

updated: Mon Mar 13 2023 13:30:59 GMT+0000 (UTC)

published: Mon Mar 13 2023 13:30:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト