Simple Disentanglement of Style and Content in Visual Representations

Lilian Ngweta; Subha Maity; Alex Gittens; Yuekai Sun; Mikhail Yurochkin

視覚的表現におけるスタイルとコンテンツの単純な解きほぐし

解釈可能な機能を備えた視覚的表現、つまり、もつれた表現を学習することは、依然として困難な問題です。既存の方法はある程度の成功を示していますが、ImageNet のような大規模なビジョンデータセットに適用するのは困難です。この作業では、事前にトレーニングされたビジョンモデルから学習した表現のコンテンツとスタイルを解きほぐす単純な後処理フレームワークを提案します。事前にトレーニングされた機能を、潜在コンテンツとスタイルファクターの線形に絡み合った組み合わせとして確率的にモデル化し、確率モデルに基づいて単純なもつれ解消アルゴリズムを開発します。この方法がコンテンツとスタイルの特徴を確実に解きほぐし、その有効性を経験的に検証することを示します。スタイルの変更やスタイルに関連する疑似相関によって分布シフトが発生した場合、後処理された機能により、ドメインの汎化パフォーマンスが大幅に向上します。

Learning visual representations with interpretable features, i.e., disentangled representations, remains a challenging problem. Existing methods demonstrate some success but are hard to apply to large-scale vision datasets like ImageNet. In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models. We model the pre-trained features probabilistically as linearly entangled combinations of the latent content and style factors and develop a simple disentanglement algorithm based on the probabilistic model. We show that the method provably disentangles content and style features and verify its efficacy empirically. Our post-processed features yield significant domain generalization performance improvements when the distribution shift occurs due to style changes or style-related spurious correlations.

updated: Wed May 31 2023 17:25:09 GMT+0000 (UTC)

published: Mon Feb 20 2023 06:48:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト