Conviformers: Convolutionally guided Vision Transformer

Mohit Vaishnav; Thomas Fel; Ivań Felipe Rodríguez; Thomas Serre

コンバイフォーマー: 畳み込みガイド付きビジョントランスフォーマー

ビジョントランスフォーマーは、現在、画像分類タスクのデファクトチョイスです。分類タスクには、細粒度と粗粒度の 2 つの大きなカテゴリがあります。きめの細かい分類では、サブクラス間の類似性が高いため、微妙な違いを発見する必要があります。このような区別は、ビジョントランスフォーマー (ViT) に関連するメモリと計算コストを節約するために画像を縮小すると失われることがよくあります。この作業では、詳細な分析を提示し、標本シートから植物を細かく分類するためのシステムを開発するための重要なコンポーネントについて説明します。私たちの広範な実験的分析は、より優れた拡張技術と、より高次元の画像を処理するための最新のニューラルネットワークの能力の必要性を示しました。また、Conviformer と呼ばれる畳み込みトランスフォーマーアーキテクチャも紹介します。これは、一般的な Vision Transformer (ConViT) とは異なり、メモリや計算コストを爆発させることなく、より高解像度の画像を処理できます。また、PreSizer と呼ばれる新しい改良された前処理技術を導入して、元の縦横比を維持しながら画像のサイズをより適切に変更します。これは、天然植物の分類に不可欠であることが証明されています。シンプルで効果的なアプローチにより、Herbarium 202x と iNaturalist 2019 データセットで SoTA を達成しました。

Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.

updated: Sun Aug 28 2022 11:46:25 GMT+0000 (UTC)

published: Wed Aug 17 2022 13:09:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト