Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation

Gengcong Yang; Jingyi Zhang; Yong Zhang; Baoyuan Wu; Yujiu Yang

シーングラフ生成のための意味論的曖昧さの確率的モデリング

「正確な」シーングラフを生成するために、ほとんどすべての既存の方法は、決定論的な方法でペアワイズ関係を予測します。ただし、視覚的な関係は意味的にあいまいであることが多いと主張します。具体的には、言語知識に触発されて、あいまいさを3つのタイプに分類します。同義語のあいまいさ、下位概念のあいまいさ、およびマルチビューのあいまいさです。あいまいさは当然、暗黙のマルチラベルの問題につながり、多様な予測の必要性を動機付けます。この作業では、新しいプラグアンドプレイ確率的不確実性モデリング（PUM）モジュールを提案します。各ユニオン領域をガウス分布としてモデル化し、その分散は対応する視覚コンテンツの不確実性を測定します。従来の決定論的手法と比較して、このような不確実性モデリングは特徴表現の確率論をもたらし、自然に多様な予測を可能にします。副産物として、PUMはよりきめ細かい関係をカバーすることもでき、したがって頻繁な関係への偏見の問題を軽減します。大規模なVisualGenomeベンチマークでの広範な実験は、PUMを新しく提案されたResCAGCNと組み合わせると、特に平均再現率メトリックの下で最先端のパフォーマンスを達成できることを示しています。さらに、PUMをいくつかの既存のモデルにプラグインすることにより、PUMの普遍的な有効性を証明し、多様でありながらもっともらしい視覚的関係を生成する能力の洞察に満ちた分析を提供します。

To generate "accurate" scene graphs, almost all existing methods predict pairwise relationships in a deterministic manner. However, we argue that visual relationships are often semantically ambiguous. Specifically, inspired by linguistic knowledge, we classify the ambiguity into three types: Synonymy Ambiguity, Hyponymy Ambiguity, and Multi-view Ambiguity. The ambiguity naturally leads to the issue of implicit multi-label, motivating the need for diverse predictions. In this work, we propose a novel plug-and-play Probabilistic Uncertainty Modeling (PUM) module. It models each union region as a Gaussian distribution, whose variance measures the uncertainty of the corresponding visual content. Compared to the conventional deterministic methods, such uncertainty modeling brings stochasticity of feature representation, which naturally enables diverse predictions. As a byproduct, PUM also manages to cover more fine-grained relationships and thus alleviates the issue of bias towards frequent relationships. Extensive experiments on the large-scale Visual Genome benchmark show that combining PUM with newly proposed ResCAGCN can achieve state-of-the-art performances, especially under the mean recall metric. Furthermore, we prove the universal effectiveness of PUM by plugging it into some existing models and provide insightful analysis of its ability to generate diverse yet plausible visual relationships.

updated: Tue Mar 09 2021 07:36:09 GMT+0000 (UTC)

published: Tue Mar 09 2021 07:36:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト