No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Raja Sunkara; Tie Luo

ストライド畳み込みやプーリングはもう必要ありません: 低解像度画像と小さなオブジェクトのための新しい CNN ビルディングブロック

畳み込みニューラルネットワーク (CNN) は、画像分類やオブジェクト検出など、多くのコンピュータービジョンタスクで大きな成功を収めています。ただし、画像の解像度が低い場合やオブジェクトが小さい場合、より厳しいタスクではパフォーマンスが急速に低下します。このホワイトペーパーでは、これが既存の CNN アーキテクチャの欠陥はあるが一般的な設計、つまり、ストライド畳み込みおよび/またはプーリングレイヤーの使用に起因していることを指摘します。 .この目的のために、各ストライド畳み込み層と各プーリング層の代わりに、SPD-Conv と呼ばれる新しい CNN ビルディングブロックを提案します (したがって、それらを完全に排除します)。 SPD-Conv は、Space-to-Depth (SPD) 層とそれに続く non-strided convolution (Conv) 層で構成され、すべてではないにしてもほとんどの CNN アーキテクチャに適用できます。この新しい設計について、最も代表的な 2 つのコンピュータービジョンタスクである物体検出と画像分類について説明します。次に、SPD-Conv を YOLOv5 と ResNet に適用することで新しい CNN アーキテクチャを作成し、特に低解像度の画像や小さなオブジェクトを使用するより困難なタスクにおいて、私たちのアプローチが最先端の深層学習モデルよりも大幅に優れていることを経験的に示しています。 https://github.com/LabSAINT/SPD-Conv でコードをオープンソース化しました。

Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.

updated: Sun Aug 07 2022 05:09:18 GMT+0000 (UTC)

published: Sun Aug 07 2022 05:09:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト