Data Augmentation Vision Transformer for Fine-grained Image Classification

Chao Hu; Liqiang Zhu; Weibin Qiu; Weijie Wu

きめの細かい画像分類のための Data Augmentation Vision Transformer

最近、ビジョントランスフォーマー (ViT) が画像認識にブレークスルーをもたらしました。その自己注意メカニズム (MSA) は、さまざまなピクセルブロックの識別ラベル情報を抽出して、画像分類の精度を向上させることができます。ただし、深いレイヤーの分類マークは、レイヤー間のローカルフィーチャを無視する傾向があります。さらに、埋め込み層は固定サイズのピクセルブロックになります。入力ネットワーク必然的に画像ノイズが追加されます。この目的のために、この論文では、データ拡張に基づくデータ拡張ビジョントランスフォーマー (DAVT) を研究し、注意の重みをガイドとして使用して画像をトリミングし、重要な特徴を学習するネットワークの能力を向上させる、注意のトリミングのためのデータ拡張方法を提案します。 .第二に、この論文では、レベル間のラベルをフィルタリングおよび融合することにより、学習レベル間の識別マーカーの能力を向上させる階層的注意選択（HAS）方法も提案します。実験結果は、CUB-200-2011 と Stanford Dogs の 2 つの一般的なデータセットに対するこの方法の精度が、既存の主流の方法よりも優れており、その精度が元の ViT よりもそれぞれ 1.4% および 1.6% 高いことを示しています。

Recently, the vision transformer (ViT) has made breakthroughs in image recognition. Its self-attention mechanism (MSA) can extract discriminative labeling information of different pixel blocks to improve image classification accuracy. However, the classification marks in their deep layers tend to ignore local features between layers. In addition, the embedding layer will be fixed-size pixel blocks. Input network Inevitably introduces additional image noise. To this end, this paper studies a data augmentation vision transformer (DAVT) based on data augmentation and proposes a data augmentation method for attention cropping, which uses attention weights as the guide to crop images and improve the ability of the network to learn critical features. Secondly, this paper also proposes a hierarchical attention selection (HAS) method, which improves the ability of discriminative markers between levels of learning by filtering and fusing labels between levels. Experimental results show that the accuracy of this method on the two general datasets, CUB-200-2011, and Stanford Dogs, is better than the existing mainstream methods, and its accuracy is 1.4% and 1.6% higher than the original ViT, respectively.

updated: Wed Nov 23 2022 11:34:11 GMT+0000 (UTC)

published: Wed Nov 23 2022 11:34:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト