Conditional Vector Graphics Generation for Music Cover Images

Valeria Efimova; Ivan Jarsky; Ilya Bizyaev; Andrey Filchenkov

音楽カバー画像の条件付きベクトルグラフィックス生成

敵対的生成ネットワーク（GAN）は、コンピューター画像合成の領域の急速な成長を動機付けています。ほとんどすべての既存の画像合成アルゴリズムは画像をピクセルマトリックスと見なすため、高解像度の画像合成は複雑です。適切な代替手段はベクトル画像です。ただし、それらは高度に洗練されたパラメトリック空間に属しており、GANによるベクトルグラフィックスの合成タスクを解決するための制限です。この論文では、この制限を劇的に緩和し、ベクター画像合成の使用を可能にする特定のアプリケーションドメインを検討します。音楽のカバー画像は、インターネットストリーミングサービスと印刷標準の要件を満たす必要があります。これは、そのような画像のコンテンツに追加の要件がなくても、グラフィック素材の高解像度を意味します。既存の音楽カバー画像生成サービスは、トラック自体を分析しません。ただし、一部のサービスでは、ほとんどの場合、ジャンルタグのみが考慮されます。音楽を反映し、単純な幾何学的オブジェクトで構成されるベクトル画像として音楽カバーを生成するには、CoverGANと呼ばれるGANベースのアルゴリズムをお勧めします。結果として得られる画像の評価は、タイトルまたは歌詞に応じたAttnGANおよびDALL-Eのテキストから画像への生成と比較した音楽への対応に基づいています。さらに、CoverGANによって検出されたパターンの重要性は、生成されたカバー画像と音楽トラックとの対応の観点から評価されています。リスナーは、提案されたアルゴリズムによって生成された音楽カバーを非常に満足のいくものであり、トラックに対応していると評価します。音楽カバー画像の生成コードとデモは、https：//github.com/IzhanVarsky/CoverGANで入手できます。

Generative Adversarial Networks (GAN) have motivated a rapid growth of the domain of computer image synthesis. As almost all the existing image synthesis algorithms consider an image as a pixel matrix, the high-resolution image synthesis is complicated.A good alternative can be vector images. However, they belong to the highly sophisticated parametric space, which is a restriction for solving the task of synthesizing vector graphics by GANs. In this paper, we consider a specific application domain that softens this restriction dramatically allowing the usage of vector image synthesis. Music cover images should meet the requirements of Internet streaming services and printing standards, which imply high resolution of graphic materials without any additional requirements on the content of such images. Existing music cover image generation services do not analyze tracks themselves; however, some services mostly consider only genre tags. To generate music covers as vector images that reflect the music and consist of simple geometric objects, we suggest a GAN-based algorithm called CoverGAN. The assessment of resulting images is based on their correspondence to the music compared with AttnGAN and DALL-E text-to-image generation according to title or lyrics. Moreover, the significance of the patterns found by CoverGAN has been evaluated in terms of the correspondence of the generated cover images to the musical tracks. Listeners evaluate the music covers generated by the proposed algorithm as quite satisfactory and corresponding to the tracks. Music cover images generation code and demo are available at https://github.com/IzhanVarsky/CoverGAN.

updated: Sun May 15 2022 14:43:03 GMT+0000 (UTC)

published: Sun May 15 2022 14:43:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト