Structured Sparse R-CNN for Direct Scene Graph Generation

Yao Teng; Limin Wang

直接シーングラフ生成のための構造化スパースR-CNN

シーングラフ生成（SGG）は、画像内のオブジェクトのペアとそれらの関係を検出することです。既存のSGGアプローチでは、多くの場合、多段階パイプラインを使用して、このタスクをオブジェクト検出、関係グラフの構築、および密または密から疎の関係予測に分解します。代わりに、直接セット予測としてのSGGの観点から、このペーパーでは、StructuredSparseR-CNNと呼ばれる単純でスパースで統一されたフレームワークを紹介します。私たちの方法の鍵は、学習可能なトリプレットクエリのセットと、トレーニングセットからエンドツーエンドの方法で共同で最適化できる構造化されたトリプレット検出器です。具体的には、トリプレットクエリは、オブジェクトペアの一般的な事前分布とそれらの関係をエンコードし、その後の改良のためにシーングラフの初期推定を提供します。トリプレット検出器は、カスタマイズされたダイナミックヘッドを使用して、検出されたシーングラフを段階的に改良するカスケードアーキテクチャを提供します。さらに、私たちの方法のトレーニングの難しさを軽減するために、シャムスパースR-CNNからの知識蒸留に基づいたリラックスした強化されたトレーニング戦略を提案します。 VisualGenomeとOpenImagesV4 / V6のいくつかのデータセットで実験を行い、その結果は、私たちの方法が最先端のパフォーマンスを達成していることを示しています。さらに、詳細なアブレーション研究を実施して、トリプレット検出器の設計およびトレーニング戦略における構造化モデリングに関する洞察を提供します。コードとモデルはhttps://github.com/MCG-NJU/Structured-Sparse-RCNNで入手できます。

Scene graph generation (SGG) is to detect object pairs with their relations in an image. Existing SGG approaches often use multi-stage pipelines to decompose this task into object detection, relation graph construction, and dense or dense-to-sparse relation prediction. Instead, from a perspective on SGG as a direct set prediction, this paper presents a simple, sparse, and unified framework, termed as Structured Sparse R-CNN. The key to our method is a set of learnable triplet queries and a structured triplet detector which could be jointly optimized from the training set in an end-to-end manner. Specifically, the triplet queries encode the general prior for object pairs with their relations, and provide an initial guess of scene graphs for subsequent refinement. The triplet detector presents a cascaded architecture to progressively refine the detected scene graphs with the customized dynamic heads. In addition, to relieve the training difficulty of our method, we propose a relaxed and enhanced training strategy based on knowledge distillation from a Siamese Sparse R-CNN. We perform experiments on several datasets: Visual Genome and Open Images V4/V6, and the results demonstrate that our method achieves the state-of-the-art performance. In addition, we also perform in-depth ablation studies to provide insights on our structured modeling in triplet detector design and training strategies. The code and models are made available at https://github.com/MCG-NJU/Structured-Sparse-RCNN.

updated: Mon Mar 28 2022 15:37:07 GMT+0000 (UTC)

published: Mon Jun 21 2021 02:24:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト