Temporal Relational Modeling with Self-Supervision for Action Segmentation

Dong Wang; Di Hu; Xingjian Li; Dejing Dou

アクションセグメンテーションのための自己監視による時間的リレーショナルモデリング

ビデオでの時間的リレーショナルモデリングは、行動認識や行動セグメンテーションなどの人間の行動を理解するために不可欠です。グラフ畳み込みネットワーク（GCN）は、多くのタスクの関係推論において有望な利点を示していますが、長いビデオシーケンスにグラフ畳み込みネットワークを効果的に適用することは依然として課題です。主な理由は、ノード（つまり、ビデオフレーム）の数が多いと、GCNがビデオの時間的関係をキャプチャしてモデル化するのが困難になるためです。この問題に取り組むために、この論文では、さまざまな時間間隔でのビデオフレーム間の時間的関係と依存関係をモデル化するように設計された効果的なGCNモジュールである拡張時間グラフ推論モジュール（DTGRM）を紹介します。特に、ノードがビデオのさまざまな瞬間からのフレームを表すマルチレベルの拡張時間グラフを作成することにより、時間関係をキャプチャしてモデル化します。さらに、提案されたモデルの時間的推論能力を強化するために、補助的な自己監視タスクが提案され、拡張された時間的グラフ推論モジュールがビデオ内の誤った時間的関係を見つけて修正することを奨励する。私たちのDTGRMモデルは、50Salads、Georgia Tech Egocentric Activities（GTEA）、Breakfastデータセットの3つの難しいデータセットで最先端のアクションセグメンテーションモデルよりも優れています。コードはhttps://github.com/redwang/DTGRMで入手できます。

Temporal relational modeling in video is essential for human action understanding, such as action recognition and action segmentation. Although Graph Convolution Networks (GCNs) have shown promising advantages in relation reasoning on many tasks, it is still a challenge to apply graph convolution networks on long video sequences effectively. The main reason is that large number of nodes (i.e., video frames) makes GCNs hard to capture and model temporal relations in videos. To tackle this problem, in this paper, we introduce an effective GCN module, Dilated Temporal Graph Reasoning Module (DTGRM), designed to model temporal relations and dependencies between video frames at various time spans. In particular, we capture and model temporal relations via constructing multi-level dilated temporal graphs where the nodes represent frames from different moments in video. Moreover, to enhance temporal reasoning ability of the proposed model, an auxiliary self-supervised task is proposed to encourage the dilated temporal graph reasoning module to find and correct wrong temporal relations in videos. Our DTGRM model outperforms state-of-the-art action segmentation models on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset. The code is available at https://github.com/redwang/DTGRM.

updated: Mon Dec 14 2020 13:41:28 GMT+0000 (UTC)

published: Mon Dec 14 2020 13:41:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト