Robust Action Segmentation from Timestamp Supervision

Yaser Souri; Yazan Abu Farha; Emad Bahrami; Gianpiero Francesca; Juergen Gall

タイムスタンプ監視によるロバストアクションセグメンテーション

アクションセグメンテーションは、トリミングされていないビデオの各フレームのアクションラベルを予測するタスクです。完全に監視された方法でアクションセグメンテーションのアプローチをトレーニングするためのアノテーションを取得するのは費用がかかるため、アクショントランスクリプト、アクションセット、最近ではタイムスタンプなど、さまざまな形式の弱い監督を使用してアクションセグメンテーションモデルをトレーニングするさまざまなアプローチが提案されています。タイムスタンプ監視は、アクションごとに 1 つのタイムスタンプを取得する方がすべてのフレームに注釈を付けるよりもコストがかからないため、弱い監視の有望なタイプですが、他の形式の弱い監視よりも多くの情報を提供します。ただし、以前の作業では、すべてのアクションインスタンスにタイムスタンプの注釈が付けられていると想定していました。これは、アノテーターがアクションを見逃さないと想定しているため、制限的な仮定です。この作業では、この制限的な仮定を緩和し、いくつかのアクションインスタンスの欠落している注釈を考慮に入れます。私たちのアプローチは、他のアプローチやさまざまなベースラインと比較して、欠落している注釈に対してより堅牢であることを示しています。

Action segmentation is the task of predicting an action label for each frame of an untrimmed video. As obtaining annotations to train an approach for action segmentation in a fully supervised way is expensive, various approaches have been proposed to train action segmentation models using different forms of weak supervision, e.g., action transcripts, action sets, or more recently timestamps. Timestamp supervision is a promising type of weak supervision as obtaining one timestamp per action is less expensive than annotating all frames, but it provides more information than other forms of weak supervision. However, previous works assume that every action instance is annotated with a timestamp, which is a restrictive assumption since it assumes that annotators do not miss any action. In this work, we relax this restrictive assumption and take missing annotations for some action instances into account. We show that our approach is more robust to missing annotations compared to other approaches and various baselines.

updated: Wed Oct 12 2022 18:01:14 GMT+0000 (UTC)

published: Wed Oct 12 2022 18:01:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト