WideCaps: A Wide Attention based Capsule Network for Image Classification

Pawan S J; Rishi Sharma; Hemanth Sai Ram Reddy; M Vani; Jeny Rajan

WideCaps：画像分類のためのワイドアテンションベースのカプセルネットワーク

カプセルネットワークは、特徴間の空間的関係を維持することによって同変特性を維持する独自の能力のために注目を集めたニューラルネットワークファミリーの明確で有望なセグメントです。カプセルネットワークは、特徴的な機能をカプセルにエンコードし、解析ツリー構造を構築することにより、MNISTやaffNISTなどのデータセットを使用した画像分類タスクで前例のない成功を収めました。ただし、CIFAR-10などの複雑な前景領域と背景領域を含むデータセットでは、カプセルネットワークのパフォーマンスは、その単純なデータルーティングポリシーと複雑な特徴の抽出に対する能力がないため、最適ではありません。この論文は、複雑な画像を効率的に処理するためのカプセルネットワークアーキテクチャの新しい設計戦略を提案します。提案された方法は、定義された問題に対処するために、修正されたFMルーティングアルゴリズムによって支持された広いボトルネックの残余モジュールとスクイーズおよび励起注意ブロックを組み込んでいます。広いボトルネックの残余モジュールは、複雑な特徴の抽出を容易にし、その後にスクイーズおよび励起注意ブロックが続き、些細な特徴を抑制することによってチャネルごとの注意を可能にします。この設定により、ほとんど計算コストをかけずにチャネルの相互依存性が可能になり、複雑な画像でのカプセルの表現能力が向上します。提案されたモデルのパフォーマンスを、CIFAR-10、Fashion MNIST、SVHNの3つの公開されているデータセットで広範囲に評価し、SVHNデータセットで非常に競争力のあるパフォーマンスでCIFAR-10とFashionMNISTのトップ5のパフォーマンスを上回ります。

The capsule network is a distinct and promising segment of the neural network family that drew attention due to its unique ability to maintain the equivariance property by preserving the spatial relationship amongst the features. The capsule network has attained unprecedented success over image classification tasks with datasets such as MNIST and affNIST by encoding the characteristic features into the capsules and building the parse-tree structure. However, on the datasets involving complex foreground and background regions such as CIFAR-10, the performance of the capsule network is sub-optimal due to its naive data routing policy and incompetence towards extracting complex features. This paper proposes a new design strategy for capsule network architecture for efficiently dealing with complex images. The proposed method incorporates wide bottleneck residual modules and the Squeeze and Excitation attention blocks upheld by the modified FM routing algorithm to address the defined problem. A wide bottleneck residual module facilitates extracting complex features followed by the squeeze and excitation attention block to enable channel-wise attention by suppressing the trivial features. This setup allows channel inter-dependencies at almost no computational cost, thereby enhancing the representation ability of capsules on complex images. We extensively evaluate the performance of the proposed model on three publicly available datasets, namely CIFAR-10, Fashion MNIST, and SVHN, to outperform the top-5 performance on CIFAR-10 and Fashion MNIST with highly competitive performance on the SVHN dataset.

updated: Sun Aug 08 2021 13:09:40 GMT+0000 (UTC)

published: Sun Aug 08 2021 13:09:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト