Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer

S. H. Shabbeer Basha; Debapriya Tula; Sravan Kumar Vinakota; Shiv Ram Dubey

効率的な知識移転のためのターゲットアウェアネットワークアーキテクチャの検索と圧縮

転送学習により、畳み込みニューラルネットワーク（CNN）は、ソースドメインから知識を取得し、それをターゲットドメインに転送できます。ターゲットドメインでは、大規模な注釈付きの例の収集に時間と費用がかかります。従来、あるタスクから学習した知識を別のタスクに転送する一方で、事前にトレーニングされたCNNのより深い層がターゲットデータセット上で微調整されます。ただし、元々ソースタスク用に設計されたこれらのレイヤーは、ターゲットタスク用にパラメーター化されすぎています。したがって、ターゲットデータセット上でこれらのレイヤーを微調整すると、ネットワークが非常に複雑になるため、CNNの一般化能力が低下します。この問題に取り組むために、効率的な知識の伝達を可能にするTASCNetと呼ばれる2段階のフレームワークを提案します。最初の段階では、より深いレイヤーの構成が自動的に学習され、ターゲットデータセットに対して微調整されます。その後、第2段階で、冗長フィルターが微調整されたCNNから削除され、パフォーマンスを維持しながら、ターゲットタスクのネットワークの複雑さを軽減します。この2段階のメカニズムは、仮説空間から最適な構造（畳み込み層のフィルターの数、密な層のニューロンの数など）を備えた事前トレーニング済みCNNのコンパクトバージョンを見つけます。提案された方法の有効性は、CalTech-101、CalTech-256、およびStanford DogsデータセットでVGG-16、ResNet-50、およびDenseNet-121を使用して評価されます。提案されたTASCNetは、リソース効率の高い知識の伝達を可能にするトレーニング可能なパラメーターとFLOPの両方を削減することにより、ターゲットタスクに対する事前トレーニングされたCNNの計算の複雑さを軽減します。

Transfer Learning enables Convolutional Neural Networks (CNN) to acquire knowledge from a source domain and transfer it to a target domain, where collecting large-scale annotated examples is both time-consuming and expensive. Conventionally, while transferring the knowledge learned from one task to another task, the deeper layers of a pre-trained CNN are finetuned over the target dataset. However, these layers that are originally designed for the source task are over-parameterized for the target task. Thus, finetuning these layers over the target dataset reduces the generalization ability of the CNN due to high network complexity. To tackle this problem, we propose a two-stage framework called TASCNet which enables efficient knowledge transfer. In the first stage, the configuration of the deeper layers is learned automatically and finetuned over the target dataset. Later, in the second stage, the redundant filters are pruned from the fine-tuned CNN to decrease the network's complexity for the target task while preserving the performance. This two-stage mechanism finds a compact version of the pre-trained CNN with optimal structure (number of filters in a convolutional layer, number of neurons in a dense layer, and so on) from the hypothesis space. The efficacy of the proposed method is evaluated using VGG-16, ResNet-50, and DenseNet-121 on CalTech-101, CalTech-256, and Stanford Dogs datasets. The proposed TASCNet reduces the computational complexity of pre-trained CNNs over the target task by reducing both trainable parameters and FLOPs which enables resource-efficient knowledge transfer.

updated: Thu May 12 2022 09:11:00 GMT+0000 (UTC)

published: Thu May 12 2022 09:11:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト