Coded Residual Transform for Generalizable Deep Metric Learning

Shichao Kan; Yixiong Liang; Min Li; Yigang Cen; Jianxin Wang; Zhihai He

一般化可能なディープメトリック学習のためのコード化された残差変換

ディープメトリック学習における基本的な課題は、トレーニングクラスで学習した埋め込みネットワークを新しいテストクラスで評価する必要があるため、フィーチャ埋め込みネットワークモデルの一般化機能です。この課題に対処するために、このホワイトペーパーでは、ディープメトリック学習用の符号化残差変換 (CRT) と呼ばれる新しい方法を導入して、その一般化機能を大幅に改善します。具体的には、一連の多様なプロトタイプ機能を学習し、機能マップを各プロトタイプに投影し、各プロトタイプとの相関係数によって重み付けされた投影残差を使用してその機能をエンコードします。提案する CRT 方式には、次の 2 つの特徴があります。まず、多様なプロトタイプへの投影に基づいて、補完的な視点のセットから特徴マップを表現およびエンコードします。第二に、グローバル相関分析に基づいて特徴の元の値をエンコードする既存の変換ベースの特徴表現アプローチとは異なり、提案されたコード化された残差変換は、元の特徴とそれらの投影されたプロトタイプの間の相対的な違いをエンコードします。埋め込み空間密度とスペクトル減衰分析は、多様なプロトタイプへのこの多視点投影とコード化された残差表現が、メトリック学習における大幅に改善された一般化機能を達成できることを示しています。最後に、一般化のパフォーマンスをさらに向上させるために、異なるサイズの射影プロトタイプと埋め込み次元を持つコード化された残差変換間の特徴類似性行列に一貫性を適用することを提案します。私たちの広範な実験結果とアブレーション研究は、提案された CRT メソッドが最先端のディープメトリック学習メソッドを大幅に上回り、CUB データセットで現在の最良のメソッドを最大 4.28% 改善することを示しています。

A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.

updated: Sun Oct 09 2022 06:17:31 GMT+0000 (UTC)

published: Sun Oct 09 2022 06:17:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト