KonVid-150k: A Dataset for No-Reference Video Quality Assessment of Videos in-the-Wild

Franz Götz-Hahn; Vlad Hosu; Hanhe Lin; Dietmar Saupe

KonVid-150k：野生のビデオの非参照ビデオ品質評価のためのデータセット

ビデオ品質評価（VQA）手法は、特定の劣化タイプに焦点を当てており、通常、参照ビデオの小さなセットで人為的に誘発されます。したがって、ほとんどの従来のVQA手法は、実際にはパフォーマンスが劣ります。ディープラーニングアプローチは、既存のVQAデータセットのサイズが小さく、多様であるため、人工的または真に歪められているため、成功は限られています。大幅に大きく多様な新しいインザワイルドVQAデータセットKonVid-150kを紹介します。これは、それぞれ5つの品質評価を持つ153,841本のビデオと、それぞれ最低89の評価を持つ1,596本のビデオの大まかに注釈が付けられたセットで構成されています。さらに、マルチレベルの空間的にプールされた深層特徴（MLSP）に依存する新しい効率的なVQAアプローチ（MLSP-VQA）を提案します。ディープトランスファー学習アプローチと比較して、大規模なトレーニングに非常に適しています。私たちの最良の方法であるMLSP-VQA-FFは、一般的に使用されるKoNViD-1kの野生のベンチマークデータセットでスピアマンの順位相関係数（SRCC）のパフォーマンスメトリックを0.82に改善します。これは、既存の最高の深層学習モデル（0.80 SRCC）および手作りの機能ベースの方法（0.78 SRCC）を上回っています。さらに、さまざまなレベルのラベルノイズとデータセットサイズの下で代替アプローチがどのように機能するかを調査し、MLSP-VQA-FFが実際の動画に最適な方法であることを示しています。最後に、KonVid-150kでトレーニングされたMLSP-VQAモデルが、KoNViD-1k、LIVE-VQC、およびLIVE-Qualcommでのクロステストパフォーマンスの新しい最先端技術を0.83、0.75、およびそれぞれ0.64SRCC。 KoNViD-1kとLIVE-VQCの両方で、このデータセット間テストはデータセット内実験よりも優れており、優れた一般化を示しています。

Video quality assessment (VQA) methods focus on particular degradation types, usually artificially induced on a small set of reference videos. Hence, most traditional VQA methods under-perform in-the-wild. Deep learning approaches have had limited success due to the small size and diversity of existing VQA datasets, either artificial or authentically distorted. We introduce a new in-the-wild VQA dataset that is substantially larger and diverse: KonVid-150k. It consists of a coarsely annotated set of 153,841 videos having five quality ratings each, and 1,596 videos with a minimum of 89 ratings each. Additionally, we propose new efficient VQA approaches (MLSP-VQA) relying on multi-level spatially pooled deep-features (MLSP). They are exceptionally well suited for training at scale, compared to deep transfer learning approaches. Our best method, MLSP-VQA-FF, improves the Spearman rank-order correlation coefficient (SRCC) performance metric on the commonly used KoNViD-1k in-the-wild benchmark dataset to 0.82. It surpasses the best existing deep-learning model (0.80 SRCC) and hand-crafted feature-based method (0.78 SRCC). We further investigate how alternative approaches perform under different levels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method for videos in-the-wild. Finally, we show that the MLSP-VQA models trained on KonVid-150k sets the new state-of-the-art for cross-test performance on KoNViD-1k, LIVE-VQC, and LIVE-Qualcomm with a 0.83, 0.75, and 0.64 SRCC, respectively. For both KoNViD-1k and LIVE-VQC this inter-dataset testing outperforms intra-dataset experiments, showing excellent generalization.

updated: Mon Mar 01 2021 14:25:17 GMT+0000 (UTC)

published: Tue Dec 17 2019 12:26:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト