Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings

Jaskirat Singh; Liang Zheng

人間レベルの絵画を生成するための意味論的ガイダンスと深層強化学習の組み合わせ

ストロークベースの非写実的画像の生成は、コンピュータビジョンコミュニティの重要な問題です。この方向への取り組みとして、最近のかなりの研究努力は、人間の画家と同様の方法で、機械に「塗装方法」を教えることに焦点を合わせています。ただし、以前の方法の適用性は、前景オブジェクトの位置、スケール、および顕著性にほとんど変化がないデータセットに限定されていました。結果として、これらの方法は、実世界の画像が持つ粒度と多様性をカバーするのに苦労していることがわかります。この目的のために、1）トレーニング時に前景と背景のブラシストロークの違いを学習するための2レベルのペイント手順を備えたセマンティックガイダンスパイプラインを提案します。 2）また、特定のセマンティックインスタンスにズームインするために、オブジェクトのローカリゼーションと空間トランスフォーマーネットワークをエンドツーエンドで組み合わせたニューラルアラインメントモデルを通じて、前景オブジェクトの位置とスケールに不変性を導入します。 3）次に、焦点が合っているオブジェクトの際立った特徴は、新しいガイド付き逆伝播ベースの焦点報酬を最大化することによって増幅されます。提案されたエージェントは、人間のストロークデータを監視する必要がなく、前景オブジェクトの属性の変化をうまく処理できるため、CUB-200BirdsおよびStanfordCars-196データセット用にはるかに高品質のキャンバスを作成できます。最後に、挑戦的なVirtual-KITTIデータセットでメソッドの拡張を評価することにより、複数のフォアグラウンドオブジェクトインスタンスを持つ複雑なデータセットでのメソッドのさらなる有効性を示します。ソースコードとモデルはhttps://github.com/1jsingh/semantic-guidanceで入手できます。

Generation of stroke-based non-photorealistic imagery, is an important problem in the computer vision community. As an endeavor in this direction, substantial recent research efforts have been focused on teaching machines "how to paint", in a manner similar to a human painter. However, the applicability of previous methods has been limited to datasets with little variation in position, scale and saliency of the foreground object. As a consequence, we find that these methods struggle to cover the granularity and diversity possessed by real world images. To this end, we propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time. 2) We also introduce invariance to the position and scale of the foreground object through a neural alignment model, which combines object localization and spatial transformer networks in an end to end manner, to zoom into a particular semantic instance. 3) The distinguishing features of the in-focus object are then amplified by maximizing a novel guided backpropagation based focus reward. The proposed agent does not require any supervision on human stroke-data and successfully handles variations in foreground object attributes, thus, producing much higher quality canvases for the CUB-200 Birds and Stanford Cars-196 datasets. Finally, we demonstrate the further efficacy of our method on complex datasets with multiple foreground object instances by evaluating an extension of our method on the challenging Virtual-KITTI dataset. Source code and models are available at https://github.com/1jsingh/semantic-guidance.

updated: Tue Jun 15 2021 00:39:15 GMT+0000 (UTC)

published: Wed Nov 25 2020 09:00:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト