Learning Disentangled Label Representations for Multi-label Classification

Jian Jia; Fei He; Naiyu Gao; Xiaotang Chen; Kaiqi Huang

マルチラベル分類のためのもつれたラベル表現の学習

マルチラベル分類にはさまざまな方法が提案されていますが、ほとんどのアプローチは依然として単一ラベル (マルチクラス) 分類の特徴学習メカニズムに従います。つまり、複数のラベルを分類するために共有画像特徴を学習します。ただし、この One-shared-Feature-for-Multiple-Labels (OFML) メカニズムは、識別ラベル機能の学習を助長せず、モデルの堅牢性を低下させます。 OFML メカニズムの劣等性は、クロスエントロピー損失を最小限に抑えるというコンテキストで、最適な学習画像特徴が複数の分類子との高い類似性を同時に維持できないことであることを初めて数学的に証明しました。 OFML メカニズムの制限に対処するために、One-specific-Feature-for-One-Label (OFOL) メカニズムを導入し、各ラベルのもつれのない表現を学習するための新しい unentangled label feature learning (DLFL) フレームワークを提案します。フレームワークの特異性は、学習可能なセマンティッククエリと Semantic Spatial Cross-Attention (SSCA) モジュールを含む機能分離モジュールにあります。具体的には、学習可能なセマンティッククエリは、同じラベルの異なる画像間でセマンティックの一貫性を維持します。 SSCA モジュールは、ラベル関連の空間領域をローカライズし、位置付けられた領域の特徴を対応するラベルの特徴に集約して、特徴のもつれを解消します。マルチラベル分類、歩行者属性認識、継続的なマルチラベル学習の 3 つのタスクの 8 つのデータセットで最先端のパフォーマンスを実現します。

Although various methods have been proposed for multi-label classification, most approaches still follow the feature learning mechanism of the single-label (multi-class) classification, namely, learning a shared image feature to classify multiple labels. However, we find this One-shared-Feature-for-Multiple-Labels (OFML) mechanism is not conducive to learning discriminative label features and makes the model non-robustness. For the first time, we mathematically prove that the inferiority of the OFML mechanism is that the optimal learned image feature cannot maintain high similarities with multiple classifiers simultaneously in the context of minimizing cross-entropy loss. To address the limitations of the OFML mechanism, we introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning (DLFL) framework to learn a disentangled representation for each label. The specificity of the framework lies in a feature disentangle module, which contains learnable semantic queries and a Semantic Spatial Cross-Attention (SSCA) module. Specifically, learnable semantic queries maintain semantic consistency between different images of the same label. The SSCA module localizes the label-related spatial regions and aggregates located region features into the corresponding label feature to achieve feature disentanglement. We achieve state-of-the-art performance on eight datasets of three tasks, i.e. , multi-label classification, pedestrian attribute recognition, and continual multi-label learning.

updated: Fri Dec 02 2022 21:49:34 GMT+0000 (UTC)

published: Fri Dec 02 2022 21:49:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト