Fuse and Attend: Generalized Embedding Learning for Art and Sketches

Ujjal Kr Dutta

Fuse and Attend: アートとスケッチのための一般化された埋め込み学習

深い埋め込み学習アプローチは、複数のコンピュータービジョンタスクで広く成功を収めていますが、自然画像を表現するための最先端の方法は、絵画、漫画、スケッチなどの他の領域の画像に対して必ずしもうまく機能する必要はありません。これは、自然画像と比較して、これらのドメイン全体からのデータの分布が大幅に変化したためです。スケッチのようなドメインには、多くの場合、まばらな情報ピクセルが含まれています。ただし、そのようなドメイン内のオブジェクトを認識することは、たとえばスケッチから画像への検索など、そのようなデータを活用する複数の関連アプリケーションを考えると非常に重要です。したがって、複数のドメインで適切に機能する組み込み学習モデルを実現することは、困難であるだけでなく、コンピュータービジョンにおいて極めて重要な役割を果たします。この目的のために、この論文では、さまざまなドメイン間で一般化することを目標に、新しい埋め込み学習アプローチを提案します。トレーニング中、ドメインからのクエリ画像が与えられると、ゲーテッドフュージョンとアテンションを使用して正の例を生成します。これは、クエリオブジェクトカテゴリのセマンティクスの幅広い概念を (複数のドメインにわたって) 保持します。 Contrastive Learning のおかげで、ドメイン全体で堅牢な表現を学習するために、クエリとポジティブの埋め込みを引き出します。同時に、異なるセマンティックカテゴリ (ドメイン間) からの例に対して差別的であることをモデルに教えるために、(異なるカテゴリからの) 負の埋め込みのプールも維持します。人気のある PACS (写真、アートペインティング、漫画、スケッチ) データセットで、DomainBed フレームワークを使用した手法の優れた点を示します。

While deep Embedding Learning approaches have witnessed widespread success in multiple computer vision tasks, the state-of-the-art methods for representing natural images need not necessarily perform well on images from other domains, such as paintings, cartoons, and sketch. This is because of the huge shift in the distribution of data from across these domains, as compared to natural images. Domains like sketch often contain sparse informative pixels. However, recognizing objects in such domains is crucial, given multiple relevant applications leveraging such data, for instance, sketch to image retrieval. Thus, achieving an Embedding Learning model that could perform well across multiple domains is not only challenging, but plays a pivotal role in computer vision. To this end, in this paper, we propose a novel Embedding Learning approach with the goal of generalizing across different domains. During training, given a query image from a domain, we employ gated fusion and attention to generate a positive example, which carries a broad notion of the semantics of the query object category (from across multiple domains). By virtue of Contrastive Learning, we pull the embeddings of the query and positive, in order to learn a representation which is robust across domains. At the same time, to teach the model to be discriminative against examples from different semantic categories (across domains), we also maintain a pool of negative embeddings (from different categories). We show the prowess of our method using the DomainBed framework, on the popular PACS (Photo, Art painting, Cartoon, and Sketch) dataset.

updated: Sat Aug 20 2022 14:44:11 GMT+0000 (UTC)

published: Sat Aug 20 2022 14:44:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト