HAA500: Human-Centric Atomic Action Dataset with Curated Videos

Jihoon Chung; Cheng-hsin Wuu; Hsuan-ru Yang; Yu-Wing Tai; Chi-Keung Tang

HAA500：キュレーションされたビデオを含む人間中心の原子アクションデータセット

591Kを超えるラベル付きフレームを持つ500クラスでのアクション認識のために、手動で注釈が付けられた人間中心の原子アクションデータセットであるHAA500を提供します。アクション分類のあいまいさを最小限に抑えるために、HAA500は、「野球のピッチング」と「バスケットボールのフリースロー」など、一貫したアクションのみが同じラベルに分類される、非常に多様なクラスのきめ細かいアトミックアクションで構成されています。したがって、HAA500は、粗粒度のアトミックアクションが「Throw」などの粗いアクション動詞でラベル付けされていた既存のアトミックアクションデータセットとは異なります。 HAA500は、クラスに関係のない動きや時空間ラベルノイズをほとんど発生させずに、人物の正確な動きをキャプチャするように慎重にキュレーションされています。 HAA500の利点は4つあります。1）関連する人間のポーズに対して平均69.7％の高い検出可能な関節を備えた人間中心のアクション。 2）新しいクラスの追加は20〜60分で実行できるため、高いスケーラビリティ。 3）無関係なフレームなしでアトミックアクションの本質的な要素をキャプチャするキュレーションされたビデオ。 4）きめ細かいアトミックアクションクラス。野生で収集されたデータセットを使用したクロスデータ検証を含む広範な実験は、HAA500の人間中心の原子特性の明らかな利点を示しています。これにより、ベースラインの深層学習モデルでさえ、人間の原子のポーズに注意を払うことで予測を改善することができます。 HAA500データセットの統計と収集方法を詳しく説明し、既存のアクション認識データセットと定量的に比較します。

We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames. To minimize ambiguities in action classification, HAA500 consists of highly diversified classes of fine-grained atomic actions, where only consistent actions fall under the same label, e.g., "Baseball Pitching" vs "Free Throw in Basketball". Thus HAA500 is different from existing atomic action datasets, where coarse-grained atomic actions were labeled with coarse action-verbs such as "Throw". HAA500 has been carefully curated to capture the precise movement of human figures with little class-irrelevant motions or spatio-temporal label noises. The advantages of HAA500 are fourfold: 1) human-centric actions with a high average of 69.7% detectable joints for the relevant human poses; 2) high scalability since adding a new class can be done under 20-60 minutes; 3) curated videos capturing essential elements of an atomic action without irrelevant frames; 4) fine-grained atomic action classes. Our extensive experiments including cross-data validation using datasets collected in the wild demonstrate the clear benefits of human-centric and atomic characteristics of HAA500, which enable training even a baseline deep learning model to improve prediction by attending to atomic human poses. We detail the HAA500 dataset statistics and collection methodology and compare quantitatively with existing action recognition datasets.

updated: Mon Aug 16 2021 16:59:58 GMT+0000 (UTC)

published: Fri Sep 11 2020 04:18:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト