Deep Neural Networks Fused with Textures for Image Classification

Asish Bera; Debotosh Bhattacharjee; Mita Nasipuri

画像分類用のテクスチャと融合したディープニューラルネットワーク

細粒度画像分類 (FGIC) は、サブカテゴリ間の視覚的な違いは小さいものの、クラス内での変動が大きいため、コンピュータビジョンにおける困難なタスクです。深層学習手法は、FGIC の解決において目覚ましい成功を収めています。この論文では、グローバルテクスチャとローカルパッチベースの情報を組み合わせることによって FGIC に対処する融合アプローチを提案します。最初のパイプラインは、さまざまな固定サイズの重複しないパッチから深い特徴を抽出し、長期短期メモリ (LSTM) を使用した逐次モデリングによって特徴をエンコードします。別のパスは、ローカルバイナリパターン (LBP) を使用して、複数のスケールでイメージレベルのテクスチャを計算します。両方のストリームの利点を統合して、画像分類のための効率的な特徴ベクトルを表現します。この方法は、4 つの標準的なバックボーン CNN を使用して、人間の顔、皮膚病変、食べ物、海洋生物などを表す 8 つのデータセットでテストされます。私たちの方法は、既存の方法よりも優れた分類精度を達成しており、顕著なマージンを持っています。

Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.

updated: Sun Mar 31 2024 12:27:16 GMT+0000 (UTC)

published: Thu Aug 03 2023 15:21:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト