Common CNN-based Face Embedding Spaces are (Almost) Equivalent

David McNeely-White; Benjamin Sattelberg; Nathaniel Blanchard; Ross Beveridge

一般的なCNNベースの顔埋め込みスペースは（ほぼ）同等です

CNNは、認識のための顔の埋め込みを作成するための主要な方法です。これらのネットワークは別個の複雑な非線形関数であるため、それらの埋め込みはネットワーク固有であり、したがってある程度の匿名性があると想定される場合があります。ただし、最近の調査によると、1,000個のオブジェクトのImageNet認識タスクのコンテキストでは、パフォーマンスの低下がほとんどなく（90個の個別のマッピングで中央値1.9％の削減）、個別のネットワークの機能を直接マッピングできることが示されています。この発見は、マッピングが提供されれば、異なるシステムからの埋め込みを有意義に比較できることを明らかにしました。ただし、以前の作業では、閉集合分類タスクでトレーニングおよびテストされたネットワークのみが考慮されていました。ここでは、特徴空間間の線形マッピングが開集合顔認識のコンテキストで簡単に発見できるという証拠を提示します。具体的には、さまざまなアーキテクチャとトレーニングデータセットの4つの顔認識モデルの特徴空間を、LFWでの認識精度のペナルティを1.0％以下でマッピングできることを示します。この調査結果は、YouTube Facesでも再現されており、線形マッピングが決定されると、さまざまなシステムからの埋め込みを簡単に比較できることを示しています。さらなる分析では、2つのシステムからの対応する埋め込みの500ペア未満が、埋め込みスペース間の完全なマッピングを計算するために必要であり、マッピングの次元を512から64に減らすと、パフォーマンスの低下はごくわずかになります。

CNNs are the dominant method for creating face embeddings for recognition. It might be assumed that, since these networks are distinct, complex, nonlinear functions, that their embeddings are network specific, and thus have some degree of anonymity. However, recent research has shown that distinct networks' features can be directly mapped with little performance penalty (median 1.9% reduction across 90 distinct mappings) in the context of the 1,000 object ImageNet recognition task. This finding has revealed that embeddings coming from different systems can be meaningfully compared, provided the mapping. However, prior work only considered networks trained and tested on a closed set classification task. Here, we present evidence that a linear mapping between feature spaces can be easily discovered in the context of open set face recognition. Specifically, we demonstrate that the feature spaces of four face recognition models, of varying architecture and training datasets, can be mapped between with no more than a 1.0% penalty in recognition accuracy on LFW . This finding, which we also replicate on YouTube Faces, demonstrates that embeddings from different systems can be readily compared once the linear mapping is determined. In further analysis, fewer than 500 pairs of corresponding embeddings from two systems are required to calculate the full mapping between embedding spaces, and reducing the dimensionality of the mapping from 512 to 64 produces negligible performance penalty.

updated: Mon Nov 02 2020 19:57:46 GMT+0000 (UTC)

published: Mon Oct 05 2020 20:32:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト