Diverse Multimedia Layout Generation with Multi Choice Learning

David D. Nguyen; Surya Nepal; Salil S. Kanhere

多選択学習による多様なマルチメディアレイアウトの生成

テキスト、グラフ、画像を含むマルチメディアドキュメントの視覚的に魅力的なレイアウトをデザインするには、クリエイティブインテリジェンスが必要です。レイアウトの生成をモデル化することは、美学とコミュニケーションスタイルにおける重要性から、最近注目を集めています。標準的な予測タスクとは対照的に、ユーザーの好みに応じて、さまざまな許容可能なレイアウトがあります。たとえば、あるポスターデザイナーは左上のロゴを好み、別のポスターデザイナーは右下のロゴを好みます。どちらも正しい選択ですが、既存の機械学習モデルはレイアウトを単一選択の予測問題として扱います。このような状況では、これらのモデルは、縮退したサンプルを形成する同じ入力が与えられた場合、可能なすべての選択肢を単純に平均化します。上記の例では、ロゴが中央に配置された許容できないレイアウトになります。このホワイトペーパーでは、LayoutMCL と呼ばれる自己回帰ニューラルネットワークアーキテクチャを紹介します。これは、レイアウト生成を効果的に安定させるために、複数選択予測と勝者総取り損失を使用します。 LayoutMCL は、複数の予測子を使用して各レイアウトオブジェクトの可能なオプションの範囲を学習することにより、平均化の問題を回避します。これにより、LayoutMCL は単一の入力から複数の多様なレイアウトを生成できます。これは、わずかなバリエーションで同様のレイアウトを生成する既存のアプローチとは対照的です。実際のデータ (雑誌、ドキュメント、モバイルアプリのレイアウト) の定量的ベンチマークを通じて、LayoutMCL がフレシェ開始距離 (FID) を 83 ～ 98% 削減し、既存のアプローチと比較して大幅に多様性を生み出すことを実証しました。

Designing visually appealing layouts for multimedia documents containing text, graphs and images requires a form of creative intelligence. Modelling the generation of layouts has recently gained attention due to its importance in aesthetics and communication style. In contrast to standard prediction tasks, there are a range of acceptable layouts which depend on user preferences. For example, a poster designer may prefer logos on the top-left while another prefers logos on the bottom-right. Both are correct choices yet existing machine learning models treat layouts as a single choice prediction problem. In such situations, these models would simply average over all possible choices given the same input forming a degenerate sample. In the above example, this would form an unacceptable layout with a logo in the centre. In this paper, we present an auto-regressive neural network architecture, called LayoutMCL, that uses multi-choice prediction and winner-takes-all loss to effectively stabilise layout generation. LayoutMCL avoids the averaging problem by using multiple predictors to learn a range of possible options for each layout object. This enables LayoutMCL to generate multiple and diverse layouts from a single input which is in contrast with existing approaches which yield similar layouts with minor variations. Through quantitative benchmarks on real data (magazine, document and mobile app layouts), we demonstrate that LayoutMCL reduces Fréchet Inception Distance (FID) by 83-98% and generates significantly more diversity in comparison to existing approaches.

updated: Mon Jan 16 2023 22:53:55 GMT+0000 (UTC)

published: Mon Jan 16 2023 22:53:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト