Coarse2Fine: A Two-stage Training Method for Fine-grained Visual   Classification

Amir Erfan Eshratifar; David Eigen; Michael Gormish; Massoud Pedram

Coarse2Fine：粒度の細かい視覚分類のための2段階のトレーニング方法

Coarse2Fine: A Two-stage Training Method for Fine-grained Visual Classification

細かいクラス間変動と大きなクラス内変動は、きめ細かい視覚分類の主な課題です。異なるクラスのオブジェクトは視覚的に類似した構造を共有し、同じクラスのオブジェクトは異なるポーズと視点を持つことができます。そのため、識別可能な局所的特徴（鳥のくちばしや車のヘッドライトなど）を適切に抽出することが重要です。この問題に関する最近の成功のほとんどは、ローカルな識別オブジェクトの部分をローカライズして参加できるアテンションモデルに基づいています。この作業では、視覚的注意ネットワークのトレーニング方法Coarse2Fineを提案します。これは、入力空間から注目フィーチャマップへの微分可能なパスを作成します。 Coarse2Fineは、注目されている特徴マップから生の画像の有益な領域への逆マッピング関数を学習します。 Coarse2Fineと注意の重みの直交初期化は、一般的なきめ細かい分類タスクの最先端の精度を上回ることができることを示します。

Small inter-class and large intra-class variations are the main challenges in fine-grained visual classification. Objects from different classes share visually similar structures and objects in the same class can have different poses and viewpoints. Therefore, the proper extraction of discriminative local features (e.g. bird's beak or car's headlight) is crucial. Most of the recent successes on this problem are based upon the attention models which can localize and attend the local discriminative objects parts. In this work, we propose a training method for visual attention networks, Coarse2Fine, which creates a differentiable path from the input space to the attended feature maps. Coarse2Fine learns an inverse mapping function from the attended feature maps to the informative regions in the raw image, which will guide the attention maps to better attend the fine-grained features. We show Coarse2Fine and orthogonal initialization of the attention weights can surpass the state-of-the-art accuracies on common fine-grained classification tasks.

updated: Fri Sep 06 2019 00:09:17 GMT+0000 (UTC)

published: Fri Sep 06 2019 00:09:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト