Triplet-Aware Scene Graph Embeddings

Brigit Schroeder; Subarna Tripathi; Hanlin Tang

トリプレット対応シーングラフの埋め込み

シーングラフは、画像生成、視覚的な関係の検出、視覚的な質問への回答、画像の取得などのタスクのための構造化された知識の重要な形式になっています。単語の埋め込みの視覚化と解釈は十分に理解されていますが、シーングラフの埋め込みは完全には調査されていません。この作業では、さまざまな形式の監視、特にトリプレット監視とデータ拡張を導入したレイアウト生成タスクでのシーングラフの埋め込みをトレーニングします。レイアウト予測の良さを測定する両方のメトリックで、パフォーマンスの大幅な向上が見られます。トリプレット監督とデータ増強。これらのさまざまな方法がシーングラフ表現にどのように影響するかを理解するために、いくつかの新しい視覚化および評価方法を適用して、シーングラフの埋め込みの進化を調べます。トリプレットの監督により、埋め込み分離性が大幅に改善され、レイアウト予測モデルのパフォーマンスと非常に相関していることがわかります。

Scene graphs have become an important form of structured knowledge for tasks such as for image generation, visual relation detection, visual question answering, and image retrieval. While visualizing and interpreting word embeddings is well understood, scene graph embeddings have not been fully explored. In this work, we train scene graph embeddings in a layout generation task with different forms of supervision, specifically introducing triplet super-vision and data augmentation. We see a significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU)(52.3% vs. 49.2%) and relation score (61.7% vs. 54.1%),after the addition of triplet supervision and data augmentation. To understand how these different methods affect the scene graph representation, we apply several new visualization and evaluation methods to explore the evolution of the scene graph embedding. We find that triplet supervision significantly improves the embedding separability, which is highly correlated with the performance of the layout prediction model.

updated: Thu Sep 19 2019 23:20:49 GMT+0000 (UTC)

published: Thu Sep 19 2019 23:20:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト