Improving Reliability of Fine-tuning with Block-wise Optimisation

Basel Barakat; Qiang Huang

ブロック単位の最適化による微調整の信頼性の向上

微調整を使用して、知識を伝達することでドメイン固有のタスクに取り組むことができます。微調整に関する以前の研究では、タスク固有の分類器の重みのみを適応させるか、新しいタスクデータを使用して事前トレーニング済みモデルのすべてのレイヤーを再最適化することに焦点を当てていました。最初のタイプの方法では、事前トレーニング済みのモデルと新しいタスクデータとの間の不一致を軽減することはできません。また、2 番目のタイプの方法では、限られたデータでタスクを処理するときにオーバーフィッティングが簡単に発生します。微調整の有効性を調べるために、事前トレーニング済みモデルの層のグループの重みを適応させる、新しいブロック単位の最適化メカニズムを提案します。私たちの作業では、レイヤーの選択は 4 つの異なる方法で行うことができます。 1 つ目はレイヤー単位の適応で、分類のパフォーマンスに応じて最も顕著な単一レイヤーを検索することを目的としています。 2 番目の方法は最初の方法に基づいており、個々のレイヤーを使用する代わりに少数の上位レイヤーを組み合わせて適応させます。 3 つ目はブロックベースのセグメンテーションです。ディープネットワークのレイヤーは、MaxPooling レイヤーやアクティベーションレイヤーなどの非重み付けレイヤーによってブロックにセグメント化されます。最後の方法は、固定長のスライディングウィンドウを使用して、レイヤーをブロックごとにグループ化することです。どの層のグループが微調整に最も適しているかを特定するために、検索は目的の端から開始され、選択された層と分類層を除く他の層を凍結することによって実行されます。レイヤーの最も顕著なグループは、分類パフォーマンスの観点から決定されます。私たちの実験では、提案されたアプローチは、VGG16、MobileNet-v1、MobileNet-v2、MobileNet-v3、および ResNet50v2 の 5 つの典型的な事前トレーニング済みモデルをそれぞれ微調整することにより、頻繁に使用されるデータセット Tf_flower でテストされます。得られた結果は、提案されたブロック単位のアプローチを使用すると、2 つのベースライン手法とレイヤー単位の手法よりも優れたパフォーマンスを達成できることを示しています。

Finetuning can be used to tackle domain-specific tasks by transferring knowledge. Previous studies on finetuning focused on adapting only the weights of a task-specific classifier or re-optimizing all layers of the pre-trained model using the new task data. The first type of methods cannot mitigate the mismatch between a pre-trained model and the new task data, and the second type of methods easily cause over-fitting when processing tasks with limited data. To explore the effectiveness of fine-tuning, we propose a novel block-wise optimization mechanism, which adapts the weights of a group of layers of a pre-trained model. In our work, the layer selection can be done in four different ways. The first is layer-wise adaptation, which aims to search for the most salient single layer according to the classification performance. The second way is based on the first one, jointly adapting a small number of top-ranked layers instead of using an individual layer. The third is block based segmentation, where the layers of a deep network is segmented into blocks by non-weighting layers, such as the MaxPooling layer and Activation layer. The last one is to use a fixed-length sliding window to group layers block by block. To identify which group of layers is the most suitable for finetuning, the search starts from the target end and is conducted by freezing other layers excluding the selected layers and the classification layers. The most salient group of layers is determined in terms of classification performance. In our experiments, the proposed approaches are tested on an often-used dataset, Tf_flower, by finetuning five typical pre-trained models, VGG16, MobileNet-v1, MobileNet-v2, MobileNet-v3, and ResNet50v2, respectively. The obtained results show that the use of our proposed block-wise approaches can achieve better performances than the two baseline methods and the layer-wise method.

updated: Sun Jan 15 2023 16:20:18 GMT+0000 (UTC)

published: Sun Jan 15 2023 16:20:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト