EViT: Privacy-Preserving Image Retrieval via Encrypted Vision Transformer in Cloud Computing

Qihua Feng; Peiya Li; Zhixun Lu; Chaozhuo Li; Zefang Wang; Zhiquan Liu; Chunhui Duan; Feiran Huang

EViT: クラウドコンピューティングにおける暗号化されたビジョントランスフォーマーによるプライバシー保護画像検索

画像検索システムは、ユーザーが大量の画像をリアルタイムで参照および検索するのに役立ちます。クラウドコンピューティングの台頭により、検索タスクは通常、クラウドサーバーにアウトソーシングされます。ただし、クラウドサーバーは完全に信頼できないため、クラウドシナリオはプライバシー保護の困難な課題をもたらします。この目的のために、まず暗号画像から特徴を抽出し、次にこれらの特徴に基づいて検索モデルを構築する、画像暗号化ベースのプライバシー保護画像検索スキームが開発されました。しかし、ほとんどの既存のアプローチは浅い特徴を抽出し、自明な検索モデルを設計するため、暗号画像の表現力が不十分になります。この論文では、暗号化された画像の識別表現能力を向上させる、Encrypted Vision Transformer (EViT) という新しいパラダイムを提案します。まず、包括的な規則情報を取得するために、JPEG 圧縮プロセス中にストリーム暗号によって暗号化された暗号画像から、多レベルのローカル長シーケンスとグローバルハフマンコード周波数特徴を抽出します。第二に、ビジョントランスフォーマーベースの検索モデルをマルチレベル機能と結合するように設計し、検索モデルの表現力を向上させる2つの適応データ拡張方法を提案します。私たちの提案は、自己教師ありの対照的な学習方法を介して、教師なしおよび教師ありの設定に簡単に適応できます。広範な実験により、EViT は優れた暗号化と検索の両方のパフォーマンスを実現し、画像のプライバシーを効果的に保護しながら、検索精度の点で現在のスキームを大幅に上回ることが明らかになりました。コードは https://github.com/onlinehuazai/EViT で公開されています。

Image retrieval systems help users to browse and search among extensive images in real-time. With the rise of cloud computing, retrieval tasks are usually outsourced to cloud servers. However, the cloud scenario brings a daunting challenge of privacy protection as cloud servers cannot be fully trusted. To this end, image-encryption-based privacy-preserving image retrieval schemes have been developed, which first extract features from cipher-images, and then build retrieval models based on these features. Yet, most existing approaches extract shallow features and design trivial retrieval models, resulting in insufficient expressiveness for the cipher-images. In this paper, we propose a novel paradigm named Encrypted Vision Transformer (EViT), which advances the discriminative representations capability of cipher-images. First, in order to capture comprehensive ruled information, we extract multi-level local length sequence and global Huffman-code frequency features from the cipher-images which are encrypted by stream cipher during JPEG compression process. Second, we design the Vision Transformer-based retrieval model to couple with the multi-level features, and propose two adaptive data augmentation methods to improve representation power of the retrieval model. Our proposal can be easily adapted to unsupervised and supervised settings via self-supervised contrastive learning manner. Extensive experiments reveal that EViT achieves both excellent encryption and retrieval performance, outperforming current schemes in terms of retrieval accuracy by large margins while protecting image privacy effectively. Code is publicly available at https://github.com/onlinehuazai/EViT.

updated: Wed Aug 31 2022 07:07:21 GMT+0000 (UTC)

published: Wed Aug 31 2022 07:07:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト