C3DVQA: Full-Reference Video Quality Assessment with 3D Convolutional Neural Network

Munan Xu; Junming Chen; Haiqiang Wang; Shan Liu; Ge Li; Zhiqiang Bai

C3DVQA：3D畳み込みニューラルネットワークによるフルリファレンスビデオ品質評価

従来のビデオ品質評価（VQA）メソッドはローカライズされた画質を評価し、ビデオスコアはフレームスコアを時間的に集約することで予測されます。ただし、一時的なマスキング効果の存在により、ビデオ品質は静的な画像品質とは異なる特性を示します。この記事では、完全参照VQAタスクに3Dカーネル（C3D）を使用した畳み込みニューラルネットワークを使用する新しいアーキテクチャ、つまりC3DVQAを紹介します。 C3DVQAは、特徴学習とスコアプーリングを1つの時空間特徴学習プロセスに結合します。 2D畳み込み層を使用して空間特徴を抽出し、3D畳み込み層を使用して時空間特徴を学習します。私たちは、3D畳み込みレイヤーがビデオの一時的なマスキング効果をキャプチャできることを経験的に発見しました。 LIVEおよびCSIQデータセットで提案された方法を評価しました。実験結果は、提案された方法が最先端の性能を達成することを示しています。

Traditional video quality assessment (VQA) methods evaluate localized picture quality and video score is predicted by temporally aggregating frame scores. However, video quality exhibits different characteristics from static image quality due to the existence of temporal masking effects. In this paper, we present a novel architecture, namely C3DVQA, that uses Convolutional Neural Network with 3D kernels (C3D) for full-reference VQA task. C3DVQA combines feature learning and score pooling into one spatiotemporal feature learning process. We use 2D convolutional layers to extract spatial features and 3D convolutional layers to learn spatiotemporal features. We empirically found that 3D convolutional layers are capable to capture temporal masking effects of videos. We evaluated the proposed method on the LIVE and CSIQ datasets. The experimental results demonstrate that the proposed method achieves the state-of-the-art performance.

updated: Wed Mar 04 2020 09:11:49 GMT+0000 (UTC)

published: Wed Oct 30 2019 03:21:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト