Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

Kirill Prokofiev; Vladislav Sovrasov

正確で効率的なマルチラベル画像分類のためのメトリック学習と注意ヘッドの組み合わせ

マルチラベル画像分類では、特定の画像から一連のラベルを予測できます。画像ごとに 1 つのラベルのみが割り当てられるマルチクラス分類とは異なり、このような設定はより幅広いアプリケーションに適用できます。この作業では、マルチラベル分類への 2 つの一般的なアプローチを再検討します。トランスフォーマーベースのヘッドとラベル関係情報グラフ処理ブランチです。トランスフォーマーベースのヘッドは、グラフベースの分岐よりも優れた結果を達成すると考えられていますが、適切なトレーニング戦略を使用すると、グラフベースの方法は、推論に費やす計算リソースを抑えながら、精度の低下をわずかに抑えることができると主張します。私たちのトレーニング戦略では、マルチラベル分類の事実上の標準である非対称損失 (ASL) の代わりに、メトリック学習の変更を導入します。各バイナリ分類サブ問題では、バックボーンからの L_2 正規化された特徴ベクトルで動作し、正サンプルと負サンプルの正規化された表現の間の角度をできるだけ大きくします。これにより、バイナリクロスエントロピー損失が正規化されていない特徴に対して行うよりも優れた識別能力が得られます。提案された損失とトレーニング戦略を使用して、MS-COCO、PASCAL-VOC、NUS-Wide、Visual Genome 500 などの広範なマルチラベル分類ベンチマークで単一モダリティメソッド間で SOTA 結果を取得します。メソッドのソースコードは、の一部として入手できます。 OpenVINO トレーニング拡張機能 https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such a setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy, graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its metric learning modification. In each binary classification sub-problem it operates with L_2 normalized feature vectors coming from a backbone and enforces angles between the normalized representations of positive and negative samples to be as large as possible. This results in providing a better discrimination ability, than binary cross entropy loss does on unnormalized features. With the proposed loss and training strategy, we obtain SOTA results among single modality methods on widespread multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. Source code of our method is available as a part of the OpenVINO Training Extensions https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

updated: Tue Dec 20 2022 10:00:24 GMT+0000 (UTC)

published: Wed Sep 14 2022 12:06:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト