Zero-Shot 3D Shape Sketch View Similarity and Retrieval

Gianluca Berardi; Yulia Gryaditskaya

ゼロショット 3D 形状スケッチビューの類似性と検索

個々の 3D 形状の 2D スケッチビューのペア間の類似性を定量化するための、プレテキストタスクで事前トレーニングされた ViT および ResNet フィーチャレイヤーの機能について詳細な研究を実施しました。同様のビューとグラウンドトゥルース 3D 形状を取得するモデルの能力の観点からパフォーマンスを評価します。単純なゼロショットのパフォーマンス研究を超えて、1 つまたは複数の形状クラスに対する代替の微調整戦略と、他の形状クラスへのその一般化を調査します。 NPR (Non-Photo Realistic) レンダリングの進歩を活用して、いくつかのスタイルで合成スケッチビューを生成し、対照学習を使用して事前トレーニングされた基礎モデルを微調整するために使用します。私たちは、スケッチ内のオブジェクトのスケールが、さまざまなネットワーク層のフィーチャの類似性にどのような影響を与えるかを研究します。縮尺に応じて、異なるフィーチャレイヤーがスケッチビューでの形状の類似性をよりよく示す可能性があることがわかります。ただし、同様のオブジェクトスケールにより、ViT と ResNet のパフォーマンスが最高になることがわかりました。要約すると、微調整戦略を慎重に選択することで、ゼロショット形状検索の精度を一貫して向上させることができることがわかりました。私たちは、私たちの研究がスケッチ領域の研究に大きな影響を与え、大規模な事前学習済みモデルを知覚損失として採用する方法についての洞察と指針を提供すると信じています。

We conduct a detailed study of the ability of pretrained on pretext tasks ViT and ResNet feature layers to quantify the similarity between pairs of 2D sketch views of individual 3D shapes. We assess the performance in terms of the models' abilities to retrieve similar views and ground-truth 3D shapes. Going beyond naive zero-shot performance study, we investigate alternative fine-tuning strategies on one or several shape classes, and their generalization to other shape classes. Leveraging progress in NPR (Non-Photo Realistic) rendering, we generate synthetic sketch views in several styles which we use to fine-tune pretrained foundation models using contrastive learning. We study how the scale of an object in a sketch affects the similarity of features at different network layers. We observe that depending on the scale, different feature layers can be more indicative of shape similarities in sketch views. However, we find that similar object scales result in the best performance of ViT and ResNet. In summary, we show that careful selection of a fine-tuning strategy allows us to obtain consistent improvement in zero-shot shape retrieval accuracy. We believe that our work will have a significant impact on research in the sketch domain, providing insights and guidance on how to adopt large pretrained models as perceptual losses.

updated: Wed Jun 14 2023 14:40:50 GMT+0000 (UTC)

published: Wed Jun 14 2023 14:40:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト