Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning

Wenqing Wang; Yawei Luo; Zhiqing Chen; Tao Jiang; Lei Chen; Yi Yang; Jun Xiao

ビジュアルリレーションの詳細: 分離されたラベル学習によるバイアスのないビデオシーングラフの生成

現在のビデオベースのシーングラフ生成 (VidSGG) メソッドは、トレーニングデータに固有の偏った分布が原因で、あまり表現されていない述語を予測する際のパフォーマンスが低いことがわかっています。このホワイトペーパーでは、述語を詳しく調べて、ほとんどの視覚的関係 (sit_above など) が行動パターン (sit) と空間パターン (above) の両方を含むことを確認しますが、分布バイアスはパターンレベルではそれほど深刻ではありません。この洞察に基づいて、分離されたラベル学習 (DLL) パラダイムを提案して、パターンレベルの観点から扱いにくい視覚的関係予測に対処します。具体的には、DLL は述語ラベルを分離し、個別の分類子を採用して、それぞれアクションパターンと空間パターンを学習します。次に、パターンが結合され、述語にマップされます。さらに、テールクラスの分布を調整するために、同じパターン内でヘッド述語からテール述語に非ターゲット知識を転送する知識レベルラベルデカップリング方法を提案します。一般的に使用される VidSGG ベンチマーク、つまり VidVRD で DLL の有効性を検証します。大規模な実験により、DLL が非常にシンプルでありながら非常に効果的なソリューションをロングテールの問題に提供し、最先端の VidSGG パフォーマンスを達成することが実証されています。

Current video-based scene graph generation (VidSGG) methods have been found to perform poorly on predicting predicates that are less represented due to the inherent biased distribution in the training data. In this paper, we take a closer look at the predicates and identify that most visual relations (e.g. sit_above) involve both actional pattern (sit) and spatial pattern (above), while the distribution bias is much less severe at the pattern level. Based on this insight, we propose a decoupled label learning (DLL) paradigm to address the intractable visual relation prediction from the pattern-level perspective. Specifically, DLL decouples the predicate labels and adopts separate classifiers to learn actional and spatial patterns respectively. The patterns are then combined and mapped back to the predicate. Moreover, we propose a knowledge-level label decoupling method to transfer non-target knowledge from head predicates to tail predicates within the same pattern to calibrate the distribution of tail classes. We validate the effectiveness of DLL on the commonly used VidSGG benchmark, i.e. VidVRD. Extensive experiments demonstrate that the DLL offers a remarkably simple but highly effective solution to the long-tailed problem, achieving the state-of-the-art VidSGG performance.

updated: Thu Mar 23 2023 12:08:10 GMT+0000 (UTC)

published: Thu Mar 23 2023 12:08:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト