Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

Yuyang Zhao; Zhun Zhong; Na Zhao; Nicu Sebe; Gim Hee Lee

スタイル幻覚二重一貫性学習: 視覚領域の一般化のための統合フレームワーク

ドメインシフトはビジュアルの世界に広く存在しますが、現代のディープニューラルネットワークは一般的に、一般化能力が低いためにドメインシフト下で深刻なパフォーマンス低下に悩まされ、現実世界のアプリケーションが制限されます。ドメインシフトは主に、限られたソース環境の変動と、ソースデータと目に見えないターゲットデータとの間の大きな分布ギャップにあります。この目的のために、さまざまな視覚タスクにおけるこのようなドメインシフトを処理するために、統一されたフレームワークである Style-Hallucinated Dual constEncy learning (SHADE) を提案します。具体的には、SHADE は、Style Consistency (SC) と Retrospection Consistency (RC) という 2 つの一貫性制約に基づいて構築されます。 SC はソースの状況を豊かにし、モデルがスタイルの多様なサンプル全体で一貫した表現を学習するように促します。 RC は、一般的な視覚的知識を活用して、モデルがソースデータにオーバーフィットするのを防ぎ、ソースデータと一般的な視覚モデルの間で表現の一貫性を維持します。さらに、一貫性のある学習に不可欠なスタイルの多様なサンプルを生成する新しいスタイルの幻覚モジュール (SHM) を提示します。 SHM はソース配布から基本スタイルを選択し、モデルがトレーニング中に多様で現実的なサンプルを動的に生成できるようにします。広範な実験により、汎用性の高い SHADE が、ConvNets や Transformer などのさまざまなモデルを使用して、画像分類、セマンティックセグメンテーション、オブジェクト検出など、さまざまな視覚認識タスクの一般化を大幅に強化できることが実証されています。

Domain shift widely exists in the visual world, while modern deep neural networks commonly suffer from severe performance degradation under domain shift due to the poor generalization ability, which limits the real-world applications. The domain shift mainly lies in the limited source environmental variations and the large distribution gap between source and unseen target data. To this end, we propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle such domain shift in various visual tasks. Specifically, SHADE is constructed based on two consistency constraints, Style Consistency (SC) and Retrospection Consistency (RC). SC enriches the source situations and encourages the model to learn consistent representation across style-diversified samples. RC leverages general visual knowledge to prevent the model from overfitting to source data and thus largely keeps the representation consistent between the source and general visual models. Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning. SHM selects basis styles from the source distribution, enabling the model to dynamically generate diverse and realistic samples during training. Extensive experiments demonstrate that our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection, with different models, i.e., ConvNets and Transformer.

updated: Fri Nov 24 2023 15:14:26 GMT+0000 (UTC)

published: Sun Dec 18 2022 11:42:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト