Just Noticeable Difference Modeling for Face Recognition System

Yu Tian; Zhangkai Ni; Baoliang Chen; Shurun Wang; Shiqi Wang; Hanli Wang; Sam Kwong

顔認識システムの注目差モデリング

監視およびセキュリティシナリオにおける自動顔認識 (FR) システムの安定性と信頼性を保証するには、高品質の顔画像が必要です。ただし、大量の顔データは、通常、送信または保存の制限により、分析前に圧縮されます。圧縮されたイメージは、強力な ID 情報を失う可能性があり、その結果、FR システムのパフォーマンスが低下します。ここでは、FR システムが認識できない最大の歪みとして定義できる、FR システムのちょうど目立つ差異 (JND) を調査する最初の試みを行います。具体的には、Versatile Video Coding (VVC) 標準 (VTM-15.0) に基づく高度な参照符号化/復号化ソフトウェアによって生成された 3530 の元の画像と 137,670 の圧縮画像を含む JND データセットを確立します。その後、FRシステムのJND画像を直接推測するための新しいJND予測モデルを開発します。特に、堅牢なアイデンティティ情報を損なうことなく最大の冗長性を除去するために、複数の特徴抽出モジュールと注意ベースの特徴分解モジュールを備えたエンコーダを適用して、自己を介して顔の特徴を 2 つの相関のないコンポーネント、つまりアイデンティティと残余の特徴に徐々に分解します。 -教師あり学習。次に、残差マップを生成するために残差特徴がデコーダに供給されます。最後に、元の画像から残差マップを減算することにより、予測された JND マップが取得されます。実験結果は、提案されたモデルが最先端の JND モデルと比較してより高い JND マップ予測の精度を達成し、VTM-15.0 と比較して FR システムのパフォーマンスを維持しながらより多くのビットを節約できることを実証しました。

High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.

updated: Thu Sep 28 2023 13:29:16 GMT+0000 (UTC)

published: Tue Sep 13 2022 10:06:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト