Wise-SrNet: A Novel Architecture for Enhancing Image Classification by Learning Spatial Resolution of Feature Maps

Mohammad Rahimzadeh; Soroush Parvin; Elnaz Safi; Mohammad Reza Mohammadi

Wise-SrNet：特徴マップの空間解像度を学習することによって画像分類を強化するための新しいアーキテクチャ

畳み込みニューラルネットワークの進歩以来の主な課題の1つは、抽出された特徴マップを最終的な分類層にどのように接続するかです。 VGGモデルは、アーキテクチャの分類部分に2セットの完全に接続されたレイヤーを使用しました。これにより、モデルの重みの数が大幅に増加します。 ResNetと次の深い畳み込みモデルは、グローバル平均プーリング（GAP）レイヤーを使用して、フィーチャマップを圧縮し、分類レイヤーにフィードしました。 GAPレイヤーを使用すると、計算コストが削減されますが、フィーチャマップの空間解像度が失われ、学習効率が低下します。このホワイトペーパーでは、GAPレイヤーをWise-SrNetと呼ばれる新しいアーキテクチャに置き換えることで、この問題に取り組むことを目指しています。これは、深さ方向の畳み込みアイデアに触発され、空間分解能を処理するために設計されており、計算コストも増加しません。 Intel Image Classification Challenge、MIT Indoors Scenes、およびImageNetデータセットの一部の3つの異なるデータセットを使用してメソッドを評価しました。 Inception、ResNet、およびDensNetファミリのいくつかのモデルでのアーキテクチャの実装を調査しました。私たちのアーキテクチャを適用すると、収束速度と精度の向上に大きな影響があることが明らかになりました。 224x224の解像度の画像での実験により、さまざまなデータセットとモデルでTop-1の精度が2％から8％に向上しました。 MIT Indoors Scenesデータセットの512x512解像度の画像でモデルを実行すると、Top-1の精度が3％から26％以内に向上するという注目すべき結果が示されました。また、入力画像が大きく、クラスの数が少ない場合のGAPレイヤーの欠点についても説明します。この状況では、提案されたアーキテクチャは分類結果の向上に大いに役立ちます。コードはhttps://github.com/mr7495/image-classification-spatialで共有されています。

One of the main challenges since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer. VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increases the number of models' weights. ResNet and next deep convolutional models used the Global Average Pooling (GAP) layer to compress the feature map and feed it to the classification layer. Although using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the feature map, which results in decreasing learning efficiency. In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet. It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution and also not increasing computational cost. We have evaluated our method using three different datasets: Intel Image Classification Challenge, MIT Indoors Scenes, and a part of the ImageNet dataset. We investigated the implementation of our architecture on several models of Inception, ResNet and DensNet families. Applying our architecture has revealed a significant effect on increasing convergence speed and accuracy. Our Experiments on images with 224x224 resolution increased the Top-1 accuracy between 2% to 8% on different datasets and models. Running our models on 512x512 resolution images of the MIT Indoors Scenes dataset showed a notable result of improving the Top-1 accuracy within 3% to 26%. We will also demonstrate the GAP layer's disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposed architecture can do a great help in enhancing classification results. The code is shared at https://github.com/mr7495/image-classification-spatial.

updated: Mon Apr 26 2021 00:37:11 GMT+0000 (UTC)

published: Mon Apr 26 2021 00:37:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト