Energy-Based Generative Cooperative Saliency Prediction

Jing Zhang; Jianwen Xie; Zilong Zheng; Nick Barnes

エネルギーベースの生成的協調的顕著性予測

従来の顕著性予測モデルは、通常、画像から対応するグラウンドトゥルース顕著性マップへの決定論的マッピングを学習します。本論文では、画像を与えられた顕著性マップ上の条件付き確率分布を学習し、予測をサンプリングプロセスとして扱うことにより、生成モデルの観点から顕著性予測問題を研究します。具体的には、生成的協調ネットワークに基づく生成的協調顕著性予測フレームワークを提案します。このフレームワークでは、条件付き潜在変数モデルと条件付きエネルギーベースモデルを共同でトレーニングして、協調的に顕著性を予測します。モデルをSalCoopNetsと呼びます。潜在変数モデルは、初期予測を効率的に生成するための高速ですが粗い予測子として機能します。その後、細かい予測子として機能するエネルギーベースモデルの反復ランジュバン改訂によって洗練されます。このような粗いものから細かいものへの協調的顕著性予測戦略は、両方の長所を提供します。さらに、戦略を回復しながら協調学習を提案することにより、トレーニング画像の顕著性アノテーションが部分的に観察される、弱教師あり顕著性予測のシナリオにフレームワークを一般化します。最後に、学習したエネルギー関数が、他の事前トレーニング済み顕著性予測モデルの結果を改良できる改良モジュールとして機能できることを示します。実験結果は、私たちの生成モデルが最先端のパフォーマンスを達成できることを示しています。私たちのコードはhttps://github.com/JingZhang617/SalCoopNetsで公開されています。

Conventional saliency prediction models typically learn a deterministic mapping from images to the corresponding ground truth saliency maps. In this paper, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over saliency maps given an image, and treating the prediction as a sampling process. Specifically, we propose a generative cooperative saliency prediction framework based on the generative cooperative networks, where a conditional latent variable model and a conditional energy-based model are jointly trained to predict saliency in a cooperative manner. We call our model the SalCoopNets. The latent variable model serves as a fast but coarse predictor to efficiently produce an initial prediction, which is then refined by the iterative Langevin revision of the energy-based model that serves as a fine predictor. Such a coarse-to-fine cooperative saliency prediction strategy offers the best of both worlds. Moreover, we generalize our framework to the scenario of weakly supervised saliency prediction, where saliency annotation of training images is partially observed, by proposing a cooperative learning while recovering strategy. Lastly, we show that the learned energy function can serve as a refinement module that can refine the results of other pre-trained saliency prediction models. Experimental results show that our generative model can achieve state-of-the-art performance. Our code is publicly available at: https://github.com/JingZhang617/SalCoopNets.

updated: Fri Jun 25 2021 02:11:50 GMT+0000 (UTC)

published: Fri Jun 25 2021 02:11:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト