Coarse to Fine: Multi-label Image Classification with Global/Local Attention

Fan Lyu; Fuyuan Hu; Victor S. Sheng; Zhengtian Wu; Qiming Fu; Baochuan Fu

粗いものから細かいものへ：グローバル/ローカルの注意を払ったマルチラベル画像分類

私たちの日常生活では、特にスマートシティでは、私たちの周りのシーンは常に複数のラベルが付いています。つまり、応答と制御に対する都市の運営情報を認識しています。ディープニューラルネットワークを使用してマルチラベル画像を認識することにより、多大な努力が払われてきました。マルチラベル画像の分類は非常に複雑であるため、人々は注意メカニズムを使用して分類プロセスをガイドしようとします。ただし、従来の注意ベースの方法では、常に画像を直接かつ積極的に分析していました。複雑なシーンをよく理解するのは難しいです。本論文では、人間が画像を観察する方法を模倣することにより、粗いものから細かいものまで画像を認識できるグローバル/ローカル注意法を提案します。具体的には、私たちのグローバル/ローカルアテンションメソッドは、最初に画像全体に焦点を合わせ、次に画像内のローカル固有のオブジェクトに焦点を合わせます。また、正のラベルの最小スコアが負のラベルの最大スコアよりも水平方向および垂直方向に大きくなるように強制する、共同の最大マージン目的関数を提案します。この機能により、マルチラベル画像分類法をさらに改善することができます。 2つの一般的なマルチラベル画像データセット（つまり、Pascal VOCとMS-COCO）でこの方法の有効性を評価します。私たちの実験結果は、私たちの方法が最先端の方法よりも優れていることを示しています。

In our daily life, the scenes around us are always with multiple labels especially in a smart city, i.e., recognizing the information of city operation to response and control. Great efforts have been made by using Deep Neural Networks to recognize multi-label images. Since multi-label image classification is very complicated, people seek to use the attention mechanism to guide the classification process. However, conventional attention-based methods always analyzed images directly and aggressively. It is difficult for them to well understand complicated scenes. In this paper, we propose a global/local attention method that can recognize an image from coarse to fine by mimicking how human-beings observe images. Specifically, our global/local attention method first concentrates on the whole image, and then focuses on local specific objects in the image. We also propose a joint max-margin objective function, which enforces that the minimum score of positive labels should be larger than the maximum score of negative labels horizontally and vertically. This function can further improve our multi-label image classification method. We evaluate the effectiveness of our method on two popular multi-label image datasets (i.e., Pascal VOC and MS-COCO). Our experimental results show that our method outperforms state-of-the-art methods.

updated: Sat Dec 26 2020 02:36:04 GMT+0000 (UTC)

published: Sat Dec 26 2020 02:36:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト