Balanced Mixture of SuperNets for Learning the CNN Pooling Architecture

Mehraveh Javan; Matthew Toews; Marco Pedersoli

CNN プーリングアーキテクチャを学習するためのスーパーネットのバランスのとれた混合

プーリングやストライド畳み込みを含むダウンサンプリング層は、画像特徴分析の粒度/スケールと特定の層の受容野サイズの両方を決定する畳み込みニューラルネットワークアーキテクチャの重要なコンポーネントです。この問題を完全に理解するために、ResNet20 ネットワークを使用して、CIFAR10 上の各プーリング構成で個別にトレーニングされたモデルのパフォーマンスを分析し、ダウンサンプリング層の位置がネットワークのパフォーマンスに大きく影響する可能性があり、事前定義されたダウンサンプリング構成が最適ではないことを示します。。 Network Architecture Search (NAS) は、ハイパーパラメータとしてダウンサンプリング構成を最適化するために使用される場合があります。ただし、単一の SuperNet に基づく一般的なワンショット NAS はこの問題に対して機能しないことがわかりました。これは、最適なプーリング構成を見つけるために訓練されたスーパーネットがすべてのプーリング構成間でそのパラメーターを完全に共有するためであると私たちは主張します。一部の構成を学習すると他の構成のパフォーマンスに悪影響を与える可能性があるため、トレーニングが困難になります。したがって、プーリング構成をさまざまな重みモデルに自動的に関連付け、重みの共有とスーパーネットパラメーターに対するプーリング構成の相互影響を軽減するのに役立つ、スーパーネットのバランスのとれた混合を提案します。 CIFAR10、CIFAR100、および Food101 に関して提案したアプローチを評価し、すべてのケースにおいて、モデルが他のアプローチよりも優れており、デフォルトのプーリング構成よりも改善していることを示します。

Downsampling layers, including pooling and strided convolutions, are crucial components of the convolutional neural network architecture that determine both the granularity/scale of image feature analysis as well as the receptive field size of a given layer. To fully understand this problem, we analyse the performance of models independently trained with each pooling configurations on CIFAR10, using a ResNet20 network, and show that the position of the downsampling layers can highly influence the performance of a network and predefined downsampling configurations are not optimal. Network Architecture Search (NAS) might be used to optimize downsampling configurations as an hyperparameter. However, we find that common one-shot NAS based on a single SuperNet does not work for this problem. We argue that this is because a SuperNet trained for finding the optimal pooling configuration fully shares its parameters among all pooling configurations. This makes its training hard, because learning some configurations can harm the performance of others. Therefore, we propose a balanced mixture of SuperNets that automatically associates pooling configurations to different weight models and helps to reduce the weight-sharing and inter-influence of pooling configurations on the SuperNet parameters. We evaluate our proposed approach on CIFAR10, CIFAR100, as well as Food101 and show that in all cases, our model outperforms other approaches and improves over the default pooling configurations.

updated: Wed Jun 21 2023 02:18:27 GMT+0000 (UTC)

published: Wed Jun 21 2023 02:18:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト