StyleBabel: Artistic Style Tagging and Captioning

Dan Ruta; Andrew Gilbert; Pranav Aggarwal; Naveen Marri; Ajinkya Kale; Jo Briggs; Chris Speed; Hailin Jin; Baldo Faieta; Alex Filipkowski; Zhe Lin; John Collomosse

StyleBabel：芸術的なスタイルのタグ付けとキャプション

自然言語のキャプションと自由形式のタグのユニークなオープンアクセスデータセットであるStyleBabelを紹介します。これは、専門の芸術学校やデザイン学校で勉強している専門家から新しい参加型の方法で収集された、135,000を超えるデジタルアートの芸術スタイルを説明しています。 StyleBabelは、「グラウンデッドセオリー」に触発された反復法によって収集されました。これは、きめ細かい芸術的なスタイル属性の説明のために共有言語を共進化させながら注釈を可能にする定性的アプローチです。 StyleBabelのいくつかのダウンストリームタスクを示し、最近のALADINアーキテクチャをきめ細かいスタイルの類似性に適合させて、次のクロスモーダル埋め込みをトレーニングします。1）自由形式のタグ生成。 2）芸術的なスタイルの自然言語による説明。 3）スタイルのきめ細かいテキスト検索。そのために、Visual Transformer（ViT）とクロスモーダル表現学習の最近の進歩によりALADINを拡張し、きめ細かいスタイル検索で最先端の精度を実現します。

We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by `Grounded Theory': a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.

updated: Fri Mar 11 2022 08:51:33 GMT+0000 (UTC)

published: Thu Mar 10 2022 12:15:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト