Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting

Ayan Kumar Bhunia; Pinaki Nath Chowdhury; Yongxin Yang; Timothy M. Hospedales; Tao Xiang; Yi-Zhe Song

ベクトル化とラスタライズ：スケッチと手書きのための自己教師あり学習

自己教師あり学習は、多くの困難なダウンストリームタスクで優れたパフォーマンスを実現する、ラベルのないデータから強力な表現を学習する効果により、注目を集めています。ただし、監督なしのプレテキストタスクは設計が難しく、通常はモダリティ固有です。空間（画像など）または時間データ（音声またはテキスト）モダリティのいずれかに対する自己監視法の豊富な文献がありますが、両方のモダリティに利益をもたらす一般的なプレテキストタスクはほとんどありません。この論文では、スケッチと手書きデータの自己教師ありプレテキストタスクを定義することに関心があります。このデータは、ラスタライズされた画像とベクトル座標シーケンスのデュアルモダリティに存在することを独自に特徴としています。自己教師あり特徴学習のための2つの新しいクロスモーダル翻訳プレテキストタスクを提案することにより、この二重表現に取り組み、活用します。ベクトル化とラスタライズです。ベクトル化は画像空間をベクトル座標にマッピングすることを学習し、ラスタライズはベクトル座標を画像空間にマッピングします。学習したエンコーダモジュールが、手描きデータを分析するためのラスターベースとベクトルベースの両方のダウンストリームアプローチに役立つことを示します。経験的証拠は、私たちの新しいプレテキストタスクが既存のシングルおよびマルチモーダル自己監視方法を上回っていることを示しています。

Self-supervised learning has gained prominence due to its efficacy at learning powerful representations from unlabelled data that achieve excellent performance on many challenging downstream tasks. However supervision-free pre-text tasks are challenging to design and usually modality specific. Although there is a rich literature of self-supervised methods for either spatial (such as images) or temporal data (sound or text) modalities, a common pre-text task that benefits both modalities is largely missing. In this paper, we are interested in defining a self-supervised pre-text task for sketches and handwriting data. This data is uniquely characterised by its existence in dual modalities of rasterized images and vector coordinate sequences. We address and exploit this dual representation by proposing two novel cross-modal translation pre-text tasks for self-supervised feature learning: Vectorization and Rasterization. Vectorization learns to map image space to vector coordinates and rasterization maps vector coordinates to image space. We show that the our learned encoder modules benefit both raster-based and vector-based downstream approaches to analysing hand-drawn data. Empirical evidence shows that our novel pre-text tasks surpass existing single and multi-modal self-supervision methods.

updated: Thu Mar 25 2021 09:47:18 GMT+0000 (UTC)

published: Thu Mar 25 2021 09:47:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト