Are Visual Recognition Models Robust to Image Compression?

João Maria Janeiro; Stanislav Frolov; Alaaeldin El-Nouby; Jakob Verbeek

視覚認識モデルは画像圧縮に対して堅牢ですか?

画像圧縮によってビジュアルコンテンツのデータフットプリントを削減することは、ストレージ要件を削減するために不可欠ですが、転送の帯域幅と遅延要件を削減するためにも不可欠です。特に、圧縮された画像を使用すると、データの転送が高速になり、クラウドベースのサービスに依存するエッジデバイスでの視覚認識の応答時間が短縮されます。このホワイトペーパーでは、まず、従来のコーデックを使用した画像圧縮の影響と、最近の最先端のニューラル圧縮アプローチが、画像分類、オブジェクト検出、セマンティックセグメンテーションの 3 つの視覚認識タスクに与える影響を分析します。 0.1 から 2 ビット/ピクセル (bpp) までの広範囲の圧縮レベルを考慮します。 3 つのタスクすべてで、強力な圧縮を使用すると、認識能力が大幅に影響を受けることがわかりました。たとえば、セグメンテーションの場合、評価した最適な圧縮モデルを使用して 0.1 bpp に圧縮すると、mIoU は 44.5 から 30.5 mIoU に減少します。次に、このパフォーマンスの低下が、圧縮された画像内の関連情報の損失、または圧縮アーティファクトを含む画像への視覚認識モデルの一般化の欠如にどの程度起因するかをテストします。パフォーマンスの損失の大部分は後者によるものであることがわかりました。圧縮されたトレーニング画像の認識モデルを微調整することにより、パフォーマンスの損失のほとんどが回復されます。たとえば、セグメンテーションの精度を最大 42 mIoU に戻します。つまり、元の精度低下の 82% を回復します。

Reducing the data footprint of visual content via image compression is essential to reduce storage requirements, but also to reduce the bandwidth and latency requirements for transmission. In particular, the use of compressed images allows for faster transfer of data, and faster response times for visual recognition in edge devices that rely on cloud-based services. In this paper, we first analyze the impact of image compression using traditional codecs, as well as recent state-of-the-art neural compression approaches, on three visual recognition tasks: image classification, object detection, and semantic segmentation. We consider a wide range of compression levels, ranging from 0.1 to 2 bits-per-pixel (bpp). We find that for all three tasks, the recognition ability is significantly impacted when using strong compression. For example, for segmentation mIoU is reduced from 44.5 to 30.5 mIoU when compressing to 0.1 bpp using the best compression model we evaluated. Second, we test to what extent this performance drop can be ascribed to a loss of relevant information in the compressed image, or to a lack of generalization of visual recognition models to images with compression artefacts. We find that to a large extent the performance loss is due to the latter: by finetuning the recognition models on compressed training images, most of the performance loss is recovered. For example, bringing segmentation accuracy back up to 42 mIoU, i.e. recovering 82% of the original drop in accuracy.

updated: Mon Apr 10 2023 11:30:11 GMT+0000 (UTC)

published: Mon Apr 10 2023 11:30:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト