CTRN: Class-Temporal Relational Network for Action Detection

Rui Dai; Srijan Das; Francois Bremond

CTRN：アクション検出のためのクラス時間関係ネットワーク

アクションの検出は、特にトリミングされていないビデオの密にラベル付けされたデータセットにとって、不可欠で困難なタスクです。これらのデータセットには、複合アクション、同時発生アクション、インスタンス期間の高い時間的変動など、多くの現実の課題があります。これらの課題に対処するために、検出されたアクションのクラスと時間の関係の両方を調査することを提案します。この作業では、エンドツーエンドネットワークであるクラスフルネットワーク（CTRN）を紹介します。これには、次の3つの主要コンポーネントが含まれています。（1）表現変換モジュールは、混合表現からクラス固有の機能をフィルタリングして、グラフ構造のデータを構築します。（2）Class-Temporal Moduleは、クラスと時間の関係を順次モデル化します。（3）G分類子は、スニペットごとの同時発生アクションペアの特権知識を活用して、同時発生アクションの検出をさらに改善します。 3つの挑戦的な密にラベル付けされたデータセットでCTRNを評価し、メソッドの有効性と堅牢性を反映して、最先端のパフォーマンスを実現します。

Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. There are many real-world challenges in those datasets, such as composite action, co-occurring action, and high temporal variation of instance duration. For handling these challenges, we propose to explore both the class and temporal relations of detected actions. In this work, we introduce an end-to-end network: Class-Temporal Relational Network (CTRN). It contains three key components: (1) The Representation Transform Module filters the class-specific features from the mixed representations to build graph-structured data. (2) The Class-Temporal Module models the class and temporal relations in a sequential manner. (3) G-classifier leverages the privileged knowledge of the snippet-wise co-occurring action pairs to further improve the co-occurring action detection. We evaluate CTRN on three challenging densely labelled datasets and achieve state-of-the-art performance, reflecting the effectiveness and robustness of our method.

updated: Tue Oct 26 2021 08:15:47 GMT+0000 (UTC)

published: Tue Oct 26 2021 08:15:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト