Performance Evaluation of Action Recognition Models on Low Quality Videos

Aoi Otani; Ryota Hashiguchi; Kazuki Omi; Norishige Fukushima; Toru Tamaki

低品質ビデオに対する行動認識モデルの性能評価

アクション認識モデルの設計において、ビデオの品質は重要な問題です。ただし、品質とパフォーマンスのトレードオフは無視されることがよくあります。一般に、アクション認識モデルは高品質のビデオでトレーニングされるため、低品質のビデオでテストしたときにモデルのパフォーマンスがどのように低下するか、およびトレーニングビデオの品質がパフォーマンスにどの程度影響するかは不明です。ビデオ品質の問題は重要ですが、これまで研究されていませんでした。この研究の目的は、さまざまな品質のトランスコードされたビデオのいくつかのアクション認識モデルの定量的なパフォーマンス評価によって、パフォーマンスとトレーニングビデオとテストビデオの品質の間のトレードオフを示すことです。まず、ビデオの品質が事前トレーニング済みモデルのパフォーマンスにどのように影響するかを示します。 JPEG（圧縮強度）とH.264/AVC（CRF）の品質管理パラメータを変更して、Kinetics400の元の検証ビデオをトランスコードします。次に、トランスコードされたビデオを使用して、事前トレーニング済みのモデルを検証します。 2 番目に、トランスコードされたビデオでトレーニングしたときにモデルがどのように機能するかを示します。 JPEG と H.264/AVC の品質パラメーターを変更して、Kinetics400 の元のトレーニングビデオをトランスコードします。次に、トランスコードされたトレーニングビデオでモデルをトレーニングし、元のトランスコードされた検証ビデオでモデルを検証します。 JPEG トランスコーディングを使用した実験結果では、圧縮強度が 70 未満の場合、品質の低下が視覚的に観察されない場合、重大なパフォーマンスの低下 (最大 -1.5%) はなく、80 を超える場合、パフォーマンスは品質インデックスに対して直線的に低下することが示されています。 H.264/AVC トランスコーディングの実験では、CRF30 では大幅なパフォーマンスの低下 (最大 -1%) はなく、ビデオファイルの合計サイズは 30% に減少することが示されています。

In the design of action recognition models, the quality of videos is an important issue; however, the trade-off between the quality and performance is often ignored. In general, action recognition models are trained on high-quality videos, hence it is not known how the model performance degrades when tested on low-quality videos, and how much the quality of training videos affects the performance. The issue of video quality is important, however, it has not been studied so far. The goal of this study is to show the trade-off between the performance and the quality of training and test videos by quantitative performance evaluation of several action recognition models for transcoded videos in different qualities. First, we show how the video quality affects the performance of pre-trained models. We transcode the original validation videos of Kinetics400 by changing quality control parameters of JPEG (compression strength) and H.264/AVC (CRF). Then we use the transcoded videos to validate the pre-trained models. Second, we show how the models perform when trained on transcoded videos. We transcode the original training videos of Kinetics400 by changing the quality parameters of JPEG and H.264/AVC. Then we train the models on the transcoded training videos and validate them with the original and transcoded validation videos. Experimental results with JPEG transcoding show that there is no severe performance degradation (up to -1.5%) for compression strength smaller than 70 where no quality degradation is visually observed, and for larger than 80 the performance degrades linearly with respect to the quality index. Experiments with H.264/AVC transcoding show that there is no significant performance loss (up to -1%) with CRF30 while the total size of video files is reduced to 30%.

updated: Mon Nov 14 2022 01:41:32 GMT+0000 (UTC)

published: Tue Apr 19 2022 23:56:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト