Visual Analysis Motivated Rate-Distortion Model for Image Coding

Zhimeng Huang; Chuanmin Jia; Shanshe Wang; Siwei Ma

画像コーディングのための視覚分析動機付けレート歪みモデル

ピクセル忠実度メトリック用に最適化された既存の画像コーデックによって圧縮された画像は、特に低ビットレートコーディングの下で、視覚分析タスクに使用される場合、体系的な課題に直面しています。この論文は、多用途ビデオコーディング（VVC）イントラコンプレッションのための視覚分析に動機付けられたレート歪みモデルを提案します。提案されたモデルには、新しいレート割り当て戦略と新しい歪み測定モデルの2つの主要な貢献があります。まず、視覚分析における各コーディングツリーユニット（CTU）の重要度を評価するために、マシンの関心領域（ROIM）を提案します。次に、ROIMと各CTUのローカルテクスチャ特性に基づいて、新しいCTUレベルのビット割り当てモデルを提案します。複数の歪みモデルを詳細に分析した後、各コーディングユニット（CU）の深い特徴を抽出することにより、視覚的な分析に適した歪み基準が提案されます。各CUの歪みを計算するときに空間コンテキスト情報が不足する問題を軽減するために、抽出された深い特徴を各スケールで重み付けすることにより、異なる隣接ピクセルを使用するマルチスケール特徴歪み（MSFD）メトリックを最終的に提案します。広範な実験結果は、提案されたスキームが、画像分類、オブジェクト検出、セマンティックセグメンテーションなどのいくつかの典型的な視覚分析タスク間で同じ分析パフォーマンスの下で最大28.17％のビットレート節約を達成できることを示しています。

Optimized for pixel fidelity metrics, images compressed by existing image codec are facing systematic challenges when used for visual analysis tasks, especially under low-bitrate coding. This paper proposes a visual analysis-motivated rate-distortion model for Versatile Video Coding (VVC) intra compression. The proposed model has two major contributions, a novel rate allocation strategy and a new distortion measurement model. We first propose the region of interest for machine (ROIM) to evaluate the degree of importance for each coding tree unit (CTU) in visual analysis. Then, a novel CTU-level bit allocation model is proposed based on ROIM and the local texture characteristics of each CTU. After an in-depth analysis of multiple distortion models, a visual analysis friendly distortion criteria is subsequently proposed by extracting deep feature of each coding unit (CU). To alleviate the problem of lacking spatial context information when calculating the distortion of each CU, we finally propose a multi-scale feature distortion (MSFD) metric using different neighboring pixels by weighting the extracted deep features in each scale. Extensive experimental results show that the proposed scheme could achieve up to 28.17% bitrate saving under the same analysis performance among several typical visual analysis tasks such as image classification, object detection, and semantic segmentation.

updated: Wed Apr 21 2021 02:27:34 GMT+0000 (UTC)

published: Wed Apr 21 2021 02:27:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト