PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization

Xin Qin; Hanbin Zhao; Guangchen Lin; Hao Zeng; Songcen Xu; Xi Li

PcmNet：時間的アクションローカリゼーションのための位置に敏感なコンテキストモデリングネットワーク

時間的アクションのローカリゼーションは、アクションが発生する実際のトリミングされていないビデオ内の時間的領域を特定し、それらのクラスを認識することを目的とした、重要でやりがいのあるタスクです。ビデオコンテキストはビデオを理解するための重要な手がかりであることが広く認識されており、コンテキストを活用することは、ローカリゼーションのパフォーマンスを向上させるための重要な戦略になっています。ただし、以前の最先端の方法は、フレームまたは提案間の特徴の類似性をキャプチャするセマンティックコンテキストの探索に重点を置いており、時間的ローカリゼーションに不可欠な位置コンテキストを無視しています。この論文では、より正確なアクションのローカリゼーションのために位置情報と意味情報の両方を組み込むために、時間的位置に敏感なコンテキストモデリングアプローチを提案します。具体的には、最初に指向性の時間的位置エンコーディングで特徴表現を拡張し、次にフレームレベルとプロポーザルレベルの両方で注意ベースの情報伝播を実行します。その結果、生成された特徴表現は、位置認識コンテキスト情報をエンコードする識別機能を大幅に強化され、境界の検出と提案の評価に役立ちます。 THUMOS-14とActivityNet-1.3の2つの挑戦的なデータセットの両方で最先端のパフォーマンスを達成し、私たちの方法の有効性と一般化能力を実証しています。

Temporal action localization is an important and challenging task that aims to locate temporal regions in real-world untrimmed videos where actions occur and recognize their classes. It is widely acknowledged that video context is a critical cue for video understanding, and exploiting the context has become an important strategy to boost localization performance. However, previous state-of-the-art methods focus more on exploring semantic context which captures the feature similarity among frames or proposals, and neglect positional context which is vital for temporal localization. In this paper, we propose a temporal-position-sensitive context modeling approach to incorporate both positional and semantic information for more precise action localization. Specifically, we first augment feature representations with directed temporal positional encoding, and then conduct attention-based information propagation, in both frame-level and proposal-level. Consequently, the generated feature representations are significantly empowered with the discriminative capability of encoding the position-aware context information, and thus benefit boundary detection and proposal evaluation. We achieve state-of-the-art performance on both two challenging datasets, THUMOS-14 and ActivityNet-1.3, demonstrating the effectiveness and generalization ability of our method.

updated: Tue Mar 09 2021 07:34:01 GMT+0000 (UTC)

published: Tue Mar 09 2021 07:34:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト