Generative Transformer for Accurate and Reliable Salient Object Detection

Yuxin Mao; Jing Zhang; Zhexiong Wan; Yuchao Dai; Aixuan Li; Yunqiu Lv; Xinyu Tian; Deng-Ping Fan; Nick Barnes

正確で信頼性の高い顕著な物体検出のための生成的トランスフォーマー

この論文では、顕著な物体検出への変圧器の寄与を調査し、正確で信頼性の高い顕著性予測を実現するための広範な研究を行っています。まず、決定論的ニューラルネットワークを使用した正確な顕著なオブジェクト検出のためのトランスフォーマーを調査し、効果的な構造モデリングとグローバルコンテキストモデリング機能が、CNNベースのフレームワークと比較して優れたパフォーマンスをもたらすことを説明します。次に、確率的ネットワークを設計して、信頼性の高い顕著な物体検出におけるトランスの能力を評価します。 CNNとトランスフォーマーベースのフレームワークの両方が、自信過剰の問題に大きく悩まされていることがわかります。この問題では、モデルが高い信頼度で誤った予測を生成する傾向があり、自信過剰な予測や不十分なキャリブレーションモデルにつながります。信頼性の高い顕著性予測のためのCNNベースとトランスベースの両方のフレームワークのキャリブレーション度を推定するために、生成的敵対的ネットワーク（GAN）ベースのモデルを導入して、潜在空間からサンプリングすることで自信過剰な領域を特定します。具体的には、推論生成敵対的ネットワーク（iGAN）を提示します。潜在変数の分布を固定標準正規分布N（0,1）として定義する従来のGANベースのフレームワークとは異なり、提案された「iGAN」は、勾配ベースのマルコフ連鎖モンテカルロ（MCMC）によって潜在変数を推測します。ランジュバン動力学。提案された推論生成敵対的ネットワーク（iGAN）を、完全および弱く監視された顕著なオブジェクトの検出に適用し、トランスフォーマーフレームワーク内のiGANが正確で信頼性の高い顕著なオブジェクトの検出につながることを説明します。ソースコードと実験結果は、プロジェクトページhttps://github.com/fupiao1998/TrasformerSODから公開されています。

In this paper, we conduct extensive research on exploring the contribution of transformers to salient object detection, achieving both accurate and reliable saliency predictions. We first investigate transformers for accurate salient object detection with deterministic neural networks, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. Then, we design stochastic networks to evaluate the transformers' ability in reliable salient object detection. We observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence, leading to over-confident predictions or a poorly-calibrated model. To estimate the calibration degree of both CNN- and transformer-based frameworks for reliable saliency prediction, we introduce generative adversarial network (GAN) based models to identify the over-confident regions by sampling from the latent space. Specifically, we present the inferential generative adversarial network (iGAN). Different from the conventional GAN based framework, which defines the distribution of the latent variable as fixed standard normal distribution N(0,1), the proposed "iGAN" infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics. We apply the proposed inferential generative adversarial network (iGAN) to both fully and weakly supervised salient object detection, and explain that iGAN within the transformer framework leads to both accurate and reliable salient object detection. The source code and experimental results are publicly available via our project page: https://github.com/fupiao1998/TrasformerSOD.

updated: Wed Jan 26 2022 04:29:31 GMT+0000 (UTC)

published: Tue Apr 20 2021 17:12:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト