Semantic Representation and Dependency Learning for Multi-Label Image Recognition

Tao Pu; Mingzhan Sun; Hefeng Wu; Tianshui Chen; Ling Tian; Liang Lin

マルチラベル画像認識のための意味表現と依存性学習

最近、多くのマルチラベル画像認識（MLR）作業は、事前にトレーニングされたオブジェクト検出モデルを導入して多くの提案を生成したり、統計的ラベル共起を利用してさまざまなカテゴリ間の相関を強化したりすることで、大きな進歩を遂げました。ただし、これらの作業にはいくつかの制限があります。（1）ネットワークの有効性は、高価で手ごろな価格の計算をもたらす事前トレーニング済みのオブジェクト検出モデルに大きく依存します。（2）画像に時折共起オブジェクトが存在する場合、特にまれなカテゴリの場合、ネットワークパフォーマンスが低下します。これらの問題に対処するために、各カテゴリのカテゴリ固有のセマンティック表現を学習し、すべてのカテゴリ間のセマンティック依存関係をキャプチャするための、斬新で効果的なセマンティック表現および依存関係学習（SRDL）フレームワークを提案します。具体的には、カテゴリ固有の注意領域（CAR）モジュールを設計して、チャネル/空間ごとの注意マトリックスを生成し、モデルがセマンティック認識領域に焦点を合わせるようにガイドします。また、ネットワークトレーニングを正規化するためにセマンティック認識領域を消去することにより、カテゴリ間のセマンティック依存関係を暗黙的に学習するオブジェクト消去（OE）モジュールを設計します。 2つの人気のあるMLRベンチマークデータセット（つまり、MS-COCOとPascal VOC 2007）での広範な実験と比較は、現在の最先端のアルゴリズムに対する提案されたフレームワークの有効性を示しています。

Recently many multi-label image recognition (MLR) works have made significant progress by introducing pre-trained object detection models to generate lots of proposals or utilizing statistical label co-occurrence enhance the correlation among different categories. However, these works have some limitations: (1) the effectiveness of the network significantly depends on pre-trained object detection models that bring expensive and unaffordable computation; (2) the network performance degrades when there exist occasional co-occurrence objects in images, especially for the rare categories. To address these problems, we propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category and capture semantic dependency among all categories. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model to focus on semantic-aware regions. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions to regularize the network training. Extensive experiments and comparisons on two popular MLR benchmark datasets (i.e., MS-COCO and Pascal VOC 2007) demonstrate the effectiveness of the proposed framework over current state-of-the-art algorithms.

updated: Mon Jan 09 2023 15:28:41 GMT+0000 (UTC)

published: Fri Apr 08 2022 00:55:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト