Superpixel-based Domain-Knowledge Infusion in Computer Vision

Gunjan Chhablani; Abheesht Sharma; Harshit Pandey; Tirtharaj Dash

コンピュータビジョンにおけるスーパーピクセルベースのドメイン知識の注入

スーパーピクセルは、画像内のピクセルの高次の知覚グループであり、多くの場合、生のピクセルよりもはるかに多くの情報を伝達します。画像の異なるスーパーピクセル間の関係には、固有の関係構造があります。この関係情報は、猫の画像の2つの目を表すスーパーピクセル間の関係など、画像に関する何らかの形式のドメイン情報を伝えることができます。このホワイトペーパーでの私たちの関心は、コンピュータビジョンモデル、特にこれらのスーパーピクセル情報を組み込むためのディープニューラルネットワーク（DNN）に基づくモデルを構築することです。（a）畳み込みニューラルネットワーク（CNN）を利用して画像内の空間情報を処理し、（b）グラフニューラルネットワーク（GNN）を利用して画像内のリレーショナルスーパーピクセル情報を処理するハイブリッドモデルを構築する方法論を提案します。提案されたディープモデルは、「ハイブリッド」損失と呼ばれる一般的なハイブリッド損失関数を使用して学習されます。 4つの一般的な画像分類データセット（MNIST、FMNIST、CIFAR-10、CIFAR-100）で、提案されたハイブリッドビジョンモデルの予測パフォーマンスを評価します。さらに、COVID-19 X線検出、LFW顔認識、SOCOFing指紋識別の3つの実際の分類タスクでメソッドを評価します。結果は、GNNを介して提供されるリレーショナルスーパーピクセル情報が、標準のCNNベースのビジョンシステムのパフォーマンスを向上させる可能性があることを示しています。

Superpixels are higher-order perceptual groups of pixels in an image, often carrying much more information than raw pixels. There is an inherent relational structure to the relationship among different superpixels of an image. This relational information can convey some form of domain information about the image, e.g. relationship between superpixels representing two eyes in a cat image. Our interest in this paper is to construct computer vision models, specifically those based on Deep Neural Networks (DNNs) to incorporate these superpixels information. We propose a methodology to construct a hybrid model that leverages (a) Convolutional Neural Network (CNN) to deal with spatial information in an image, and (b) Graph Neural Network (GNN) to deal with relational superpixel information in the image. The proposed deep model is learned using a generic hybrid loss function that we call a `hybrid' loss. We evaluate the predictive performance of our proposed hybrid vision model on four popular image classification datasets: MNIST, FMNIST, CIFAR-10 and CIFAR-100. Moreover, we evaluate our method on three real-world classification tasks: COVID-19 X-Ray Detection, LFW Face Recognition, and SOCOFing Fingerprint Identification. The results demonstrate that the relational superpixel information provided via a GNN could improve the performance of standard CNN-based vision systems.

updated: Thu May 20 2021 01:25:42 GMT+0000 (UTC)

published: Thu May 20 2021 01:25:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト