Target-Tailored Source-Transformation for Scene Graph Generation

Wentong Liao; Cuiling Lan; Wenjun Zeng; Michael Ying Yang; Bodo Rosenhahn

シーングラフ生成のためのターゲットに合わせたソース変換

シーングラフの生成は、オブジェクト（ノードを含む）とそれらの関係（エッジを含む）を示す、画像の意味的および構造的な説明を提供することを目的としています。これまでで最も優れた作品は、オブジェクト間で情報を渡すなど、オブジェクトや関係を取り巻くコンテキストを活用することに基づいています。これらのアプローチでは、ソースオブジェクトの表現を変換することは、ターゲットオブジェクトが使用する情報を抽出するための重要なプロセスです。この作業では、ソースオブジェクトは、すべてのターゲットに共通の情報を提供するのではなく、ターゲットオブジェクトに必要なものを提供し、異なるオブジェクトに異なる情報を提供する必要があると主張します。この目標を達成するために、Target-TailoredSource-Transformation（TTST）メソッドを提案して、オブジェクトの提案と関係の間で情報を効率的に伝達します。特に、他のターゲットオブジェクトに情報を提供するソースオブジェクトの提案では、ソースとターゲットの両方を同時に考慮することにより、ソースオブジェクトの機能をターゲットオブジェクトの機能ドメインに変換します。シーングラフ生成のための変換において、事前に言語を視覚的コンテキストと統合することにより、より強力な表現をさらに探索します。そうすることで、ターゲットオブジェクトは、ソースオブジェクトとソースリレーションからターゲット固有の情報を抽出し、それに応じてその表現を改善できます。私たちのフレームワークはVisual Genomeベンチマークで検証され、シーングラフ生成のための最先端のパフォーマンスを実証しました。実験結果は、物体検出と視覚的関係船検出の性能が我々の方法によって相互に促進されることを示している。

Scene graph generation aims to provide a semantic and structural description of an image, denoting the objects (with nodes) and their relationships (with edges). The best performing works to date are based on exploiting the context surrounding objects or relations,e.g., by passing information among objects. In these approaches, to transform the representation of source objects is a critical process for extracting information for the use by target objects. In this work, we argue that a source object should give what tar-get object needs and give different objects different information rather than contributing common information to all targets. To achieve this goal, we propose a Target-TailoredSource-Transformation (TTST) method to efficiently propagate information among object proposals and relations. Particularly, for a source object proposal which will contribute information to other target objects, we transform the source object feature to the target object feature domain by simultaneously taking both the source and target into account. We further explore more powerful representations by integrating language prior with the visual context in the transformation for the scene graph generation. By doing so the target object is able to extract target-specific information from the source object and source relation accordingly to refine its representation. Our framework is validated on the Visual Genome bench-mark and demonstrated its state-of-the-art performance for the scene graph generation. The experimental results show that the performance of object detection and visual relation-ship detection are promoted mutually by our method.

updated: Wed May 27 2020 14:33:40 GMT+0000 (UTC)

published: Wed Apr 03 2019 16:59:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト