Panoptic-based Object Style-Align for Image-to-Image Translation

Liyun Zhang; Photchara Ratsamee; Bowen Wang; Manabu Higashida; Yuki Uranishi; Haruo Takemura

パノプティコンベースのオブジェクトスタイル-画像から画像への変換に合わせます

画像翻訳の目覚ましい最近の進歩にもかかわらず、複数の矛盾するオブジェクトを含む複雑なシーンは依然として挑戦的な問題です。翻訳された画像は忠実度が低く、細部の小さなオブジェクトがあり、オブジェクト認識で不十分なパフォーマンスが得られるためです。事前知識としての画像の完全なオブジェクト認識（つまり、境界ボックス、カテゴリ、およびマスク）がないと、各オブジェクトのスタイル変換を画像変換プロセスで追跡することは困難になります。コンパクトなパノプティコンセグメンテーションデータセットとともに、画像から画像への変換のために、パノプティコンベースのオブジェクトスタイルアライン生成敵対的ネットワーク（POSA-GAN）を提案します。パノプティコンセグメンテーションモデルは、パノプティコンレベルの知覚（つまり、画像内のオーバーラップが削除された前景オブジェクトインスタンスと背景セマンティック領域）を抽出するために使用されます。これは、入力ドメイン画像のオブジェクトコンテンツコードと、ターゲットドメインのスタイル空間からサンプリングされたオブジェクトスタイルコードとの間の位置合わせをガイドするために使用されます。スタイルに合わせたオブジェクト表現はさらに変換され、より忠実なオブジェクト生成のための正確な境界レイアウトが得られます。提案された方法は、さまざまな競合する方法と体系的に比較され、翻訳された画像の画質とオブジェクト認識パフォーマンスの両方で大幅な改善が得られました。

Despite remarkable recent progress in image translation, the complex scene with multiple discrepant objects remains a challenging problem. Because the translated images have low fidelity and tiny objects in fewer details and obtain unsatisfactory performance in object recognition. Without the thorough object perception (i.e., bounding boxes, categories, and masks) of the image as prior knowledge, the style transformation of each object will be difficult to track in the image translation process. We propose panoptic-based object style-align generative adversarial networks (POSA-GANs) for image-to-image translation together with a compact panoptic segmentation dataset. The panoptic segmentation model is utilized to extract panoptic-level perception (i.e., overlap-removed foreground object instances and background semantic regions in the image). This is utilized to guide the alignment between the object content codes of the input domain image and object style codes sampled from the style space of the target domain. The style-aligned object representations are further transformed to obtain precise boundaries layout for higher fidelity object generation. The proposed method was systematically compared with different competing methods and obtained significant improvement on both image quality and object recognition performance for translated images.

updated: Fri Dec 03 2021 14:28:11 GMT+0000 (UTC)

published: Fri Dec 03 2021 14:28:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト