Video Self-Stitching Graph Network for Temporal Action Localization

Chen Zhao; Ali Thabet; Bernard Ghanem

時間的アクションローカリゼーションのためのビデオセルフスティッチンググラフネットワーク

ビデオの時間的アクションローカリゼーション（TAL）は、特にアクションの時間的スケールに大きなばらつきがあるため、困難な作業です。通常、短いアクションがデータの大部分を占めますが、現在のすべての方法でパフォーマンスが最も低くなります。この論文では、短いアクションの課題に立ち向かい、ビデオセルフスティッチンググラフネットワーク（VSGN）と呼ばれるマルチレベルのクロススケールソリューションを提案します。 VSGNには、ビデオセルフスティッチング（VSS）とクロススケールグラフピラミッドネットワーク（xGPN）の2つの主要コンポーネントがあります。 VSSでは、短期間のビデオに焦点を当て、それを時間的次元に沿って拡大して、より大きなスケールを取得します。元のクリップとその拡大されたクリップを1つの入力シーケンスでステッチして、両方のスケールの補完的なプロパティを利用します。 xGPNコンポーネントは、クロススケールグラフネットワークのピラミッドによるクロススケール相関をさらに活用します。各ネットワークには、同じスケール内だけでなく、スケール間からの機能を集約するためのハイブリッドモジュールが含まれています。私たちのVSGNは、特徴表現を強化するだけでなく、短いアクションや短いトレーニングサンプルに対してよりポジティブなアンカーを生成します。実験は、VSGNが明らかに短いアクションのローカリゼーションパフォーマンスを改善するだけでなく、THUMOS-14およびActivityNet-v1.3で最先端の全体的なパフォーマンスを達成することを示しています。

Temporal action localization (TAL) in videos is a challenging task, especially due to the large variation in action temporal scales. Short actions usually occupy the major proportion in the data, but have the lowest performance with all current methods. In this paper, we confront the challenge of short actions and propose a multi-level cross-scale solution dubbed as video self-stitching graph network (VSGN). We have two key components in VSGN: video self-stitching (VSS) and cross-scale graph pyramid network (xGPN). In VSS, we focus on a short period of a video and magnify it along the temporal dimension to obtain a larger scale. We stitch the original clip and its magnified counterpart in one input sequence to take advantage of the complementary properties of both scales. The xGPN component further exploits the cross-scale correlations by a pyramid of cross-scale graph networks, each containing a hybrid module to aggregate features from across scales as well as within the same scale. Our VSGN not only enhances the feature representations, but also generates more positive anchors for short actions and more short training samples. Experiments demonstrate that VSGN obviously improves the localization performance of short actions as well as achieving the state-of-the-art overall performance on THUMOS-14 and ActivityNet-v1.3.

updated: Tue Mar 30 2021 05:08:55 GMT+0000 (UTC)

published: Mon Nov 30 2020 07:44:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト