Orthonormal Product Quantization Network for Scalable Face Image Retrieval

Ming Zhang; Xuefei Zhe; Hong Yan

スケーラブルな顔画像検索のための正規直交製品量子化ネットワーク

既存の深量子化法は、大規模な画像検索のための効率的なソリューションを提供しました。ただし、顔画像のポーズ、照明、表情などのクラス内の重要なバリエーションは、依然として顔画像の取得に課題をもたらします。これに照らして、顔画像検索には十分に強力な学習メトリックが必要ですが、これは現在の深い量子化作業にはありません。さらに、クエリ段階で増大する目に見えないアイデンティティに取り組むために、顔の画像検索は、一般的な画像検索タスクよりもモデルの一般化とシステムのスケーラビリティに関する要求を高めます。このホワイトペーパーでは、正規直交制約を使用した製品の量子化をエンドツーエンドの深層学習フレームワークに統合して、顔画像を効果的に取得します。具体的には、事前定義された正規直交ベクトルをコードワードとして使用する新しいスキームを提案して、量子化の情報量を増やし、コードワードの冗長性を減らします。調整された損失関数は、量子化された特徴と元の特徴の両方について、各量子化部分空間のID間の識別可能性を最大化します。エントロピーベースの正則化項は、量子化誤差を減らすために課されます。実験は、表示されているIDと表示されていないIDの両方の取得設定で、一般的に使用される4つの顔データセットで実行されます。私たちの方法は、両方の設定で、比較されたすべてのディープハッシュ/量子化の最先端技術を上回っています。結果は、モデルの標準的な検索パフォーマンスと一般化能力を改善する上で、提案された正規直交コードワードの有効性を検証します。 2つの一般的な画像データセットでのさらなる実験と組み合わせると、スケーラブルな画像検索のための私たちの方法の幅広い優位性を示しています。

Existing deep quantization methods provided an efficient solution for large-scale image retrieval. However, the significant intra-class variations like pose, illumination, and expressions in face images, still pose a challenge for face image retrieval. In light of this, face image retrieval requires sufficiently powerful learning metrics, which are absent in current deep quantization works. Moreover, to tackle the growing unseen identities in the query stage, face image retrieval drives more demands regarding model generalization and system scalability than general image retrieval tasks. This paper integrates product quantization with orthonormal constraints into an end-to-end deep learning framework to effectively retrieve face images. Specifically, a novel scheme that uses predefined orthonormal vectors as codewords is proposed to enhance the quantization informativeness and reduce codewords' redundancy. A tailored loss function maximizes discriminability among identities in each quantization subspace for both the quantized and original features. An entropy-based regularization term is imposed to reduce the quantization error. Experiments are conducted on four commonly-used face datasets under both seen and unseen identities retrieval settings. Our method outperforms all the compared deep hashing/quantization state-of-the-arts under both settings. Results validate the effectiveness of the proposed orthonormal codewords in improving models' standard retrieval performance and generalization ability. Combing with further experiments on two general image datasets, it demonstrates the broad superiority of our method for scalable image retrieval.

updated: Mon Mar 21 2022 20:38:05 GMT+0000 (UTC)

published: Thu Jul 01 2021 09:30:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト