Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering

Chang Liu; Han Yu; Boyang Li; Zhiqi Shen; Zhanning Gao; Peiran Ren; Xuansong Xie; Lizhen Cui; Chunyan Miao

確率的インスタンスフィルタリングによるノイズ耐性のあるディープメトリック学習

ノイズの多いラベルは、実際のデータによく見られ、ディープニューラルネットワークのパフォーマンスを低下させます。データを手動でクリーニングすることは、労働集約的で時間がかかります。以前の研究は、ノイズの多いラベルに対する分類モデルの強化に主に焦点を当てていますが、ノイズの多いラベルに対するディープメトリック学習（DML）の堅牢性はあまりよく調べられていません。このホワイトペーパーでは、DMLの確率的ランキングベースのインスタンス選択とメモリ（PRISM）アプローチを提案することにより、この重要なギャップを埋めます。 PRISMは、ラベルがクリーンである確率を計算し、ノイズの可能性のあるサンプルを除外します。具体的には、この確率を計算する3つの方法を提案します。1）潜在的にノイズの多いデータとクリーンなデータ間の平均類似度を計算する平均類似度法（AvgSim）。 2）プロキシ類似性メソッド（ProxySim）。これは、AvgSimによって維持されているセンターを、プロキシベースのメソッドによってトレーニングされたプロキシに置き換えます。 3）フォンミーゼス-フィッシャー分布類似性（vMF-Sim）。これは、各データクラスのフォンミーゼス-フィッシャー分布を推定します。このような設計により、提案されたアプローチは、サンプルの大部分がノイズの多い困難なDML状況に対処できます。合成データセットと実世界のノイズの多いデータセットの両方での広範な実験により、提案されたアプローチは、妥当なトレーニング時間内に、最高のパフォーマンスを発揮する最先端のベースラインアプローチと比較して最大8.37％高いPrecision @ 1を達成することが示されています。

Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks. Cleaning data manually is labour-intensive and time-consuming. Previous research mostly focuses on enhancing classification models against noisy labels, while the robustness of deep metric learning (DML) against noisy labels remains less well-explored. In this paper, we bridge this important gap by proposing Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML. PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose three methods to calculate this probability: 1) Average Similarity Method (AvgSim), which calculates the average similarity between potentially noisy data and clean data; 2) Proxy Similarity Method (ProxySim), which replaces the centers maintained by AvgSim with the proxies trained by proxy-based method; and 3) von Mises-Fisher Distribution Similarity (vMF-Sim), which estimates a von Mises-Fisher distribution for each data class. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy dataset show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time.

updated: Tue Aug 03 2021 12:15:25 GMT+0000 (UTC)

published: Tue Aug 03 2021 12:15:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト