Investigating Transfer Learning Capabilities of Vision Transformers and CNNs by Fine-Tuning a Single Trainable Block

Durvesh Malpure; Onkar Litake; Rajesh Ingle

単一のトレーニング可能なブロックを微調整することによるビジョントランスフォーマーとCNNの伝達学習機能の調査

コンピュータビジョンの分野における最近の開発では、変圧器ベースのアーキテクチャの使用が増加しています。これらは、CNNアーキテクチャによって設定された最先端の精度を上回っていますが、一方で、ゼロからトレーニングするには計算コストが非常に高くなります。これらのモデルはコンピュータビジョンの分野ではごく最近のものであるため、転送学習機能を調査してCNNと比較し、小さなデータを使用する実際の問題に適用した場合にどのアーキテクチャが優れているかを理解する必要があります。この作業では、CIFAR-10のImageNet1Kで事前トレーニングされたCNNモデルとTransformerモデルの両方を微調整するためのシンプルでありながら制限的な方法に従い、それらを相互に比較します。モデルの最後のトランスフォーマー/エンコーダーまたは最後の畳み込みブロックのみをフリーズ解除し、分類のために最後に単純なMLPを追加しながら、その前のすべてのレイヤーをフリーズします。この単純な変更により、これら両方のニューラルネットワークの未加工の学習済み重みを使用できます。私たちの実験から、トランスフォーマーベースのアーキテクチャーはCNNよりも高い精度を達成するだけでなく、一部のトランスフォーマーはパラメーターの数が約4分の1でこの偉業を達成することさえわかっています。

In recent developments in the field of Computer Vision, a rise is seen in the use of transformer-based architectures. They are surpassing the state-of-the-art set by CNN architectures in accuracy but on the other hand, they are computationally very expensive to train from scratch. As these models are quite recent in the Computer Vision field, there is a need to study it's transfer learning capabilities and compare it with CNNs so that we can understand which architecture is better when applied to real world problems with small data. In this work, we follow a simple yet restrictive method for fine-tuning both CNN and Transformer models pretrained on ImageNet1K on CIFAR-10 and compare them with each other. We only unfreeze the last transformer/encoder or last convolutional block of a model and freeze all the layers before it while adding a simple MLP at the end for classification. This simple modification lets us use the raw learned weights of both these neural networks. From our experiments, we find out that transformers-based architectures not only achieve higher accuracy than CNNs but some transformers even achieve this feat with around 4 times lesser number of parameters.

updated: Mon Oct 11 2021 13:43:03 GMT+0000 (UTC)

published: Mon Oct 11 2021 13:43:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト