Learning Representational Invariances for Data-Efficient Action Recognition

Yuliang Zou; Jinwoo Choi; Qitong Wang; Jia-Bin Huang

データ効率の高い行動認識のための表象的不変性の学習

データ拡張は、ラベル付けされたデータが不足している場合に画像分類を改善するためのユビキタスな手法です。モデルの予測をさまざまなデータ拡張に対して不変になるように制約すると、モデルに目的の表現上の不変性（たとえば、測光変動に対する不変性）が効果的に注入され、精度が向上します。画像データと比較して、ビデオの外観の変化は、追加の時間的次元のためにはるかに複雑です。それでも、ビデオのデータ拡張方法はまだ十分に検討されていません。このホワイトペーパーでは、測光、幾何学的、時間的、俳優/シーンの拡張など、さまざまなビデオの不変性をキャプチャするさまざまなデータ拡張戦略について調査します。既存の半教師あり学習フレームワークと統合すると、データ拡張戦略が、低ラベルレジームのKinetics-100 / 400、Mini-Something-v2、UCF-101、およびHMDB-51データセットで有望なパフォーマンスにつながることを示します。。また、完全に監視された設定でデータ拡張戦略を検証し、パフォーマンスの向上を実証します。

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101, and HMDB-51 datasets in the low-label regime. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.

updated: Mon Feb 14 2022 17:23:46 GMT+0000 (UTC)

published: Tue Mar 30 2021 17:59:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト