Image Quality Assessment: Unifying Structure and Texture Similarity

Keyan Ding; Kede Ma; Shiqi Wang; Eero P. Simoncelli

画質評価：構造とテクスチャの類似性の統合

画質の客観的な測定は、通常、「劣化した」画像のピクセルを元の画像のピクセルと比較することによって機能します。人間の観察者と比較して、これらの測定値は、テクスチャ領域のリサンプリングに過度に敏感です（たとえば、草のパッチを別のパッチに置き換える）。ここでは、テクスチャリサンプリングに対する明示的な許容度を備えた最初の完全参照画質モデルを開発します。畳み込みニューラルネットワークを使用して、画像をマルチスケールの過剰な表現に変換する単射で微分可能な関数を構築します。この表現の特徴マップの空間平均は、さまざまなテクスチャパターンを合成するのに十分な統計的制約のセットを提供するという点で、テクスチャの外観をキャプチャすることを経験的に示しています。次に、これらの空間平均の相関（「テクスチャ類似性」）と特徴マップの相関（「構造類似性」）を組み合わせた画質方法について説明します。提案された測定値のパラメータは、同じテクスチャ画像から切り取られたサブ画像間の報告された距離を最小化しながら、画質の人間の評価に一致するように共同で最適化されます。実験は、最適化された方法が、従来の画質データベースとテクスチャデータベースの両方で人間の知覚スコアを説明することを示しています。このメジャーは、テクスチャの分類や取得などの関連タスクで競争力のあるパフォーマンスも提供します。最後に、特別なトレーニングやデータ拡張を使用せずに、私たちの方法が幾何学的変換（たとえば、平行移動や拡張）に比較的鈍感であることを示します。コードはhttps://github.com/dingkeyan93/DISTSで入手できます。

Objective measures of image quality generally operate by comparing pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tolerance to texture resampling. Using a convolutional neural network, we construct an injective and differentiable function that transforms images to multi-scale overcomplete representations. We demonstrate empirically that the spatial averages of the feature maps in this representation capture texture appearance, in that they provide a set of sufficient statistical constraints to synthesize a wide variety of texture patterns. We then describe an image quality method that combines correlations of these spatial averages ("texture similarity") with correlations of the feature maps ("structure similarity"). The parameters of the proposed measure are jointly optimized to match human ratings of image quality, while minimizing the reported distances between subimages cropped from the same texture images. Experiments show that the optimized method explains human perceptual scores, both on conventional image quality databases, as well as on texture databases. The measure also offers competitive performance on related tasks such as texture classification and retrieval. Finally, we show that our method is relatively insensitive to geometric transformations (e.g., translation and dilation), without use of any specialized training or data augmentation. Code is available at https://github.com/dingkeyan93/DISTS.

updated: Wed Dec 16 2020 12:56:44 GMT+0000 (UTC)

published: Thu Apr 16 2020 16:11:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト