Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification

Kirill Prokofiev; Vladislav Sovrasov

正確で効率的なマルチラベル画像分類のためのメトリック学習と注意ヘッドの組み合わせ

マルチラベル画像分類では、特定の画像から一連のラベルを予測できます。画像ごとに 1 つのラベルのみが割り当てられるマルチクラス分類とは異なり、このような設定はより幅広いアプリケーションに適用できます。この作業では、マルチラベル分類への 2 つの一般的なアプローチを再検討します。トランスフォーマーベースのヘッドとラベル関係情報グラフ処理ブランチです。トランスフォーマーベースのヘッドはグラフベースの分岐よりも優れた結果を達成すると考えられていますが、適切なトレーニング戦略を使用すれば、グラフベースの方法は、推論に費やす計算リソースを少なくしながら、精度の低下をわずかに抑えることができると主張します。私たちのトレーニング戦略では、マルチラベル分類の事実上の標準である非対称損失 (ASL) の代わりに、角度空間で作用するその修正を導入します。各クラスの単位超球でプロキシ特徴ベクトルを暗黙的に学習し、バイナリクロスエントロピー損失が正規化されていない特徴に対して行うよりも優れた識別能力を提供します。提案された損失とトレーニング戦略を使用して、MS-COCO、PASCAL-VOC、NUS-Wide、Visual Genome 500 などの広範なマルチラベル分類ベンチマークで単一モダリティメソッド間で SOTA 結果を取得します。メソッドのソースコードは、の一部として入手できます。 OpenVINO トレーニング拡張機能 https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

Multi-label image classification allows predicting a set of labels from a given image. Unlike multiclass classification, where only one label per image is assigned, such setup is applicable for a broader range of applications. In this work we revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop, while spending less computational resources on inference. In our training strategy, instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel classification, we introduce its modification acting in the angle space. It implicitly learns a proxy feature vector on the unit hypersphere for each class, providing a better discrimination ability, than binary cross entropy loss does on unnormalized features. With the proposed loss and training strategy, we obtain SOTA results among single modality methods on widespread multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and Visual Genome 500. Source code of our method is available as a part of the OpenVINO Training Extensions https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel

updated: Wed Sep 14 2022 12:06:47 GMT+0000 (UTC)

published: Wed Sep 14 2022 12:06:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト