Enriching Local and Global Contexts for Temporal Action Localization

Zixin Zhu; Wei Tang; Le Wang; Nanning Zheng; Gang Hua

一時的なアクションのローカリゼーションのためのローカルおよびグローバルコンテキストの強化

時間的行動局在化（TAL）の問題に効果的に取り組むには、2つの交絡目標、すなわち時間的局在化のためのきめ細かい識別と行動分類のための十分な視覚的不変性を共同で追求する視覚的表現が必要です。この課題に対処するために、人気のある2段階の時間的ローカリゼーションフレームワークでローカルコンテキストとグローバルコンテキストの両方を強化します。このフレームワークでは、アクション提案が最初に生成され、次にアクション分類と時間境界回帰が行われます。 ContextLocと呼ばれる提案されたモデルは、L-Net、G-Net、およびP-Netの3つのサブネットワークに分割できます。 L-Netは、クエリと取得のプロセスとして定式化されたスニペットレベルの機能のきめ細かいモデリングを通じて、ローカルコンテキストを強化します。 G-Netは、ビデオレベルの表現の高レベルのモデリングを通じて、グローバルコンテキストを充実させます。さらに、グローバルコンテキストをさまざまな提案に適応させるための新しいコンテキスト適応モジュールを紹介します。 P-Netは、コンテキストを意識した提案間の関係をさらにモデル化します。実験では、P-Netとなる2つの既存のモデルを調査します。提案された方法の有効性は、THUMOS14（tIoU @ 0.5で54.3％）およびActivityNet v1.3（tIoU @ 0.5で56.01％）データセットの実験結果によって検証され、最近の最先端技術を上回っています。コードはhttps://github.com/buxiangzhiren/ContextLocで入手できます。

Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three sub-networks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at tIoU@0.5) and ActivityNet v1.3 (56.01% at tIoU@0.5) datasets, which outperforms recent states of the art. Code is available at https://github.com/buxiangzhiren/ContextLoc.

updated: Sat Aug 07 2021 06:27:18 GMT+0000 (UTC)

published: Tue Jul 27 2021 17:25:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト