Action Segmentation with Mixed Temporal Domain Adaptation

Min-Hung Chen; Baopu Li; Yingze Bao; Ghassan AlRegib

混合時間ドメイン適応によるアクションセグメンテーション

アクションセグメンテーションの主な進歩は、完全に教師あり学習のための高密度に注釈が付けられたデータから来ています。フレームレベルのアクションの手動アノテーションは時間がかかり、やりがいがあるため、この問題をドメイン適応（DA）問題として形成することにより、取得がはるかに簡単な補助的なラベルなしビデオを活用することを提案します。近年、さまざまなDA手法が提案されていますが、そのほとんどは空間方向のみに開発されています。したがって、混合時間ドメイン適応（MTDA）を提案して、ドメイン間でフレームレベルとビデオレベルの埋め込み特徴空間を共同で整列させ、さらにドメインアテンションメカニズムと統合して、フレームレベルの特徴をより高いドメインの不一致に整列させることに焦点を当てます。より効果的なドメイン適応。最後に、3つの挑戦的なデータセット（GTEA、50Salads、Breakfast）で提案された方法を評価し、MTDAが3つのデータセットすべてで現在の最先端の方法を大幅に上回っていることを検証します（たとえば、F1 @で6.4％の増加）。 GTEAの編集スコアが50％および6.8％増加します）。

The main progress for action segmentation comes from densely-annotated data for fully-supervised learning. Since manual annotation for frame-level actions is time-consuming and challenging, we propose to exploit auxiliary unlabeled videos, which are much easier to obtain, by shaping this problem as a domain adaptation (DA) problem. Although various DA techniques have been proposed in recent years, most of them have been developed only for the spatial direction. Therefore, we propose Mixed Temporal Domain Adaptation (MTDA) to jointly align frame- and video-level embedded feature spaces across domains, and further integrate with the domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Finally, we evaluate our proposed methods on three challenging datasets (GTEA, 50Salads, and Breakfast), and validate that MTDA outperforms the current state-of-the-art methods on all three datasets by large margins (e.g. 6.4% gain on F1@50 and 6.8% gain on the edit score for GTEA).

updated: Fri Apr 16 2021 00:46:15 GMT+0000 (UTC)

published: Thu Apr 15 2021 13:48:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト