Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Shengyu Feng; Subarna Tripathi; Hesham Mostafa; Marcel Nassar; Somdeb Majumdar

動的シーングラフを生成するための長期的な依存関係の活用

ビデオからの動的なシーングラフの生成は、シーンの時間的なダイナミクスと、予測に固有の時間的な変動のために困難です。長期的な一時的な依存関係をキャプチャすることが、動的シーングラフを効果的に生成するための鍵であるという仮説を立てています。トランスフォーマーを使用して、オブジェクトレベルの長期的なトラックレットでオブジェクトレベルの一貫性とオブジェクト間の関係のダイナミクスをキャプチャすることにより、ビデオで長期的な依存関係を学習することを提案します。実験結果は、ダイナミックシーングラフ検出トランスフォーマー (DSG-DETR) が、ベンチマークデータセット Action Genome で最先端の方法よりも大幅に優れていることを示しています。私たちのアブレーション研究は、提案されたアプローチの各コンポーネントの有効性を検証します。ソースコードは、https://github.com/Shengyu-Feng/DSG-DETR で入手できます。

Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions. We hypothesize that capturing long-term temporal dependencies is the key to effective generation of dynamic scene graphs. We propose to learn the long-term dependencies in a video by capturing the object-level consistency and inter-object relationship dynamics over object-level long-term tracklets using transformers. Experimental results demonstrate that our Dynamic Scene Graph Detection Transformer (DSG-DETR) outperforms state-of-the-art methods by a significant margin on the benchmark dataset Action Genome. Our ablation studies validate the effectiveness of each component of the proposed approach. The source code is available at https://github.com/Shengyu-Feng/DSG-DETR.

updated: Wed Oct 19 2022 16:58:46 GMT+0000 (UTC)

published: Sat Dec 18 2021 03:02:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト