Is it Enough to Optimize CNN Architectures on ImageNet?

Lukas Tuggener; Jürgen Schmidhuber; Thilo Stadelmann

ImageNetでCNNアーキテクチャを最適化するのに十分ですか？

現代のコンピュータービジョン研究の暗黙的ではあるが普及している仮説は、ImageNetでより優れたパフォーマンスを発揮する畳み込みニューラルネットワーク（CNN）アーキテクチャは、他のビジョンデータセットでもより優れたパフォーマンスを発揮するというものです。 ImageNetで500個のサンプリングされたCNNアーキテクチャと、さまざまなアプリケーションドメインからの8つの他の画像分類データセットをトレーニングする広範な実証研究を通じて、この仮説に挑戦します。アーキテクチャとパフォーマンスの関係は、データセットによって大きく異なります。それらのいくつかについては、ImageNetとのパフォーマンスの相関関係はさらに負です。明らかに、すべてのアプリケーションに関連する進歩を目指す場合、ImageNet専用のアーキテクチャを最適化するだけでは十分ではありません。したがって、2つのデータセット固有のパフォーマンス指標を特定します。レイヤー間の累積幅とネットワークの合計深度です。最後に、ImageNetがカバーするデータセットの変動性の範囲は、少数のクラスに制限されたImageNetサブセットを追加することによって大幅に拡張できることを示します。

An implicit but pervasive hypothesis of modern computer vision research is that convolutional neural network (CNN) architectures that perform better on ImageNet will also perform better on other vision datasets. We challenge this hypothesis through an extensive empirical study for which we train 500 sampled CNN architectures on ImageNet as well as 8 other image classification datasets from a wide array of application domains. The relationship between architecture and performance varies wildly, depending on the datasets. For some of them, the performance correlation with ImageNet is even negative. Clearly, it is not enough to optimize architectures solely for ImageNet when aiming for progress that is relevant for all applications. Therefore, we identify two dataset-specific performance indicators: the cumulative width across layers as well as the total depth of the network. Lastly, we show that the range of dataset variability covered by ImageNet can be significantly extended by adding ImageNet subsets restricted to few classes.

updated: Tue Mar 16 2021 14:42:01 GMT+0000 (UTC)

published: Tue Mar 16 2021 14:42:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト