An Energy-Based Prior for Generative Saliency

Jing Zhang; Jianwen Xie; Nick Barnes; Ping Li

生成顕著性のためのエネルギーベースの事前分布

事前分布として有益なエネルギーベースのモデルを採用する、新しい生成的顕著性予測フレームワークを提案します。エネルギーベースの事前モデルは、連続潜在変数と観測画像に基づいて顕著性マップを生成する顕著性生成ネットワークの潜在空間上で定義されます。顕著性ジェネレーターのパラメーターとエネルギーベースの事前分布の両方は、マルコフ連鎖モンテカルロベースの最尤推定を介して共同でトレーニングされます。この推定では、潜在変数の扱いにくい事後分布および事前分布からのサンプリングがランジュバン力学によって実行されます。生成顕著性モデルを使用すると、画像からピクセルごとの不確実性マップを取得でき、顕著性予測におけるモデルの信頼性を示します。潜在変数の事前分布を単純な等方性ガウス分布として定義する既存の生成モデルとは異なり、私たちのモデルは、データの潜在空間をより表現力豊かに捉えることができるエネルギーベースの有益な事前分布を使用します。有益なエネルギーベースの事前分布を使用して、生成モデルのガウス分布仮定を拡張して、潜在空間のより代表的な分布を実現し、より信頼性の高い不確実性推定につながります。提案されたフレームワークを、トランスフォーマーと畳み込みニューラルネットワークの両方のバックボーンを使用して、RGB と RGB-D の両方の顕著物体検出タスクに適用します。さらに、提案された生成フレームワークを訓練するための代替手段として、敵対的学習アルゴリズムと変分推論アルゴリズムを提案します。実験結果は、エネルギーベースの事前分布を使用した私たちの生成顕著性モデルが、正確な顕著性予測だけでなく、人間の知覚と一致する信頼性の高い不確実性マップも達成できることを示しています。結果とコードは https://github.com/JingZhang617/EBMGSOD で入手できます。

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. The energy-based prior model is defined on the latent space of a saliency generator network that generates the saliency map based on a continuous latent variables and an observed image. Both the parameters of saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Different from existing generative models, which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive in capturing the latent space of the data. With the informative energy-based prior, we extend the Gaussian distribution assumption of generative models to achieve a more representative distribution of the latent space, leading to more reliable uncertainty estimation. We apply the proposed frameworks to both RGB and RGB-D salient object detection tasks with both transformer and convolutional neural network backbones. We further propose an adversarial learning algorithm and a variational inference algorithm as alternatives to train the proposed generative framework. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps that are consistent with human perception. Results and code are available at https://github.com/JingZhang617/EBMGSOD.

updated: Tue Jun 27 2023 06:51:25 GMT+0000 (UTC)

published: Tue Apr 19 2022 10:51:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト