On Evaluating Weakly Supervised Action Segmentation Methods

Yaser Souri; Alexander Richard; Luca Minciullo; Juergen Gall

弱く監視されたアクションのセグメンテーション方法の評価について

アクションセグメンテーションは、トリミングされていないビデオのすべてのフレームを時間的にセグメント化するタスクです。特にトランスクリプトからの、アクションセグメンテーションへの弱く監視されたアプローチは、コンピュータビジョンコミュニティにとって大きな関心事です。この作業では、しばしば見過ごされがちな監視の弱いアクションセグメンテーションアプローチの使用と評価の2つの側面に焦点を合わせます。複数のトレーニング実行に対するパフォーマンスの差異と、このタスクの特徴抽出機能を選択することの影響です。最初の問題に取り組むために、朝食データセットで各メソッドを5回トレーニングし、結果の平均と標準偏差を提供します。私たちの実験は、これらの繰り返しの標準偏差が1〜2.5％であり、異なるアプローチ間の比較に大きな影響を与えることを示しています。さらに、特徴抽出に関する私たちの調査は、研究された弱く監視されたアクションセグメンテーションメソッドの場合、より高いレベルのI3D機能が従来のIDT機能よりもパフォーマンスが低いことを示しています。

Action segmentation is the task of temporally segmenting every frame of an untrimmed video. Weakly supervised approaches to action segmentation, especially from transcripts have been of considerable interest to the computer vision community. In this work, we focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches that are often overlooked: the performance variance over multiple training runs and the impact of selecting feature extractors for this task. To tackle the first problem, we train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results. Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches. Furthermore, our investigation on feature extraction shows that, for the studied weakly-supervised action segmentation methods, higher-level I3D features perform worse than classical IDT features.

updated: Thu Oct 21 2021 17:16:34 GMT+0000 (UTC)

published: Tue May 19 2020 20:30:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト