WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition

Marius Bock; Hilde Kuehne; Kristof Van Laerhoven; Michael Moeller

WEAR: ウェアラブルで自己中心的なアクティビティ認識のためのアウトドアスポーツデータセット

研究により、カメラベースのデータと慣性ベースのデータの相補性が示されていますが、両方のモダリティを提供するデータセットは依然として希少です。この論文では、視覚ベースと慣性ベースの両方の人間活動認識 (HAR) のためのアウトドアスポーツデータセットである WEAR を紹介します。データセットは、合計 18 の異なるトレーニングアクティビティを実行する 18 人の参加者からのデータで構成されており、トリミングされていない慣性 (加速度) とカメラ (自己中心的なビデオ) データが 10 の異なる屋外の場所で記録されています。以前の自己中心的なデータセットとは異なり、WEAR は、意図的に導入されたアクティビティの変動と、モダリティ間の全体的な小さな情報の重複によって特徴付けられる、挑戦的な予測シナリオを提供します。提供されたベンチマーク結果は、単一モダリティアーキテクチャごとに、予測パフォーマンスに異なる長所と短所があることを明らかにしています。さらに、トランスフォーマーベースの時間的動作位置特定モデルの最近の成功を考慮して、視覚、慣性、および組み合わせ (視覚 + 慣性) 特徴を入力として使用して、それらを単純な方法で適用することにより、その多用途性を実証します。結果は、慣性データに対するビジョンベースの変換器の適用可能性と、単純な連結による両方のモダリティの融合の両方を実証し、組み合わせたアプローチ (ビジョン + 慣性機能) により、最高の平均精度と最高に近い F1- を生み出すことができます。スコア。実験を再現するためのデータセットとコードは、https://mariusbock.github.io/wear/ から公開されています。

Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both modalities remain scarce. In this paper, we introduce WEAR, an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). The dataset comprises data from 18 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 10 different outside locations. Unlike previous egocentric datasets, WEAR provides a challenging prediction scenario marked by purposely introduced activity variations as well as an overall small information overlap across modalities. Provided benchmark results reveal that single-modality architectures each have different strengths and weaknesses in their prediction performance. Further, in light of the recent success of transformer-based temporal action localization models, we demonstrate their versatility by applying them in a plain fashion using vision, inertial and combined (vision + inertial) features as input. Results demonstrate both the applicability of vision-based transformers for inertial data and fusing both modalities by means of simple concatenation, with the combined approach (vision + inertial features) being able to produce the highest mean average precision and close-to-best F1-score. The dataset and code to reproduce experiments is publicly available via: https://mariusbock.github.io/wear/

updated: Fri Jun 16 2023 07:46:34 GMT+0000 (UTC)

published: Tue Apr 11 2023 09:31:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト