Auto-Encoding Score Distribution Regression for Action Quality Assessment

Boyu Zhang; Jiayuan Chen; Yinfei Xu; Hui Zhang; Xu Yang; Xin Geng

アクション品質評価のための自動エンコードスコア分布回帰

ビデオとアクションスコアの関係をモデル化するのが難しいため、ビデオのアクション品質評価 (AQA) は困難なビジョンタスクです。したがって、AQA は文献で広く研究されています。従来、AQA はビデオとアクションスコアの間の基本的なマッピングを学習するための回帰問題として扱われていました。しかし、以前の方法では、AQA データセットのデータの不確実性が無視されていました。偶然の不確実性に対処するために、プラグアンドプレイモジュールのディストリビューションオートエンコーダー (DAE) をさらに開発します。具体的には、ビデオをディストリビューションにエンコードし、変分自動エンコーダー (VAE) の再パラメーター化トリックを使用してスコアをサンプリングします。これにより、ビデオとスコアの間により正確なマッピングが確立されます。一方、不確実性パラメータの学習には尤度損失が使用されます。 DAE アプローチを MUDL と CoRe に組み込みます。公開データセットでの実験結果は、私たちの方法が AQA-7、MTL-AQA、および JIGSAWS データセットで最先端を達成することを示しています。コードは https://github.com/InfoX-SEU/DAE-AQA で入手できます。

The action quality assessment (AQA) of videos is a challenging vision task since the relation between videos and action scores is difficult to model. Thus, AQA has been widely studied in the literature. Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores. But previous methods ignored data uncertainty in AQA dataset. To address aleatoric uncertainty, we further develop a plug-and-play module Distribution Auto-Encoder (DAE). Specifically, it encodes videos into distributions and uses the reparameterization trick in variational auto-encoders (VAE) to sample scores, which establishes a more accurate mapping between videos and scores. Meanwhile, a likelihood loss is used to learn the uncertainty parameters. We plug our DAE approach into MUSDL and CoRe. Experimental results on public datasets demonstrate that our method achieves state-of-the-art on AQA-7, MTL-AQA, and JIGSAWS datasets. Our code is available at https://github.com/InfoX-SEU/DAE-AQA.

updated: Wed Aug 31 2022 10:33:50 GMT+0000 (UTC)

published: Mon Nov 22 2021 07:30:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト