Few-Shot Unsupervised Image-to-Image Translation on complex scenes

Luca Barras; Samuel Chassot; Daniel Filipe Nunes Silva

複雑なシーンでの数ショットの教師なし画像から画像への変換

教師なし画像から画像への変換方法は、ここ数年で大きな注目を集めています。さまざまな観点から最初の課題に取り組む複数の手法が登場しました。翻訳のためにいくつかのターゲットスタイルイメージから可能な限り多くのことを学習することに焦点を当てているものもあれば、コンテンツの豊富なシーンでより現実的な結果を生成するためにオブジェクト検出を利用するものもあります。この作業では、単一オブジェクトの翻訳のために最初に開発された方法が、より多様でコンテンツの豊富な画像でどのように機能するかを評価します。私たちの仕事は FUNIT[1] フレームワークに基づいており、より多様なデータセットでトレーニングしています。これは、そのようなメソッドがアプリケーションの最初のフレームを超えてどのように動作するかを理解するのに役立ちます。オブジェクト検出に基づいてデータセットを拡張する方法を示します。さらに、他の方法で見ることができるオブジェクト検出の力を活用するために、FUNIT フレームワークを適応させる方法を提案します。

Unsupervised image-to-image translation methods have received a lot of attention in the last few years. Multiple techniques emerged tackling the initial challenge from different perspectives. Some focus on learning as much as possible from several target style images for translations while other make use of object detection in order to produce more realistic results on content-rich scenes. In this work, we assess how a method that has initially been developed for single object translation performs on more diverse and content-rich images. Our work is based on the FUNIT[1] framework and we train it with a more diverse dataset. This helps understanding how such method behaves beyond their initial frame of application. We present a way to extend a dataset based on object detection. Moreover, we propose a way to adapt the FUNIT framework in order to leverage the power of object detection that one can see in other methods.

updated: Mon Jun 07 2021 16:33:19 GMT+0000 (UTC)

published: Mon Jun 07 2021 16:33:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト