Instance-aware Model Ensemble With Distillation For Unsupervised Domain Adaptation

Weimin Wu; Jiayuan Fan; Tao Chen; Hancheng Ye; Bo Zhang; Baopu Li

教師なしドメイン適応のための蒸留によるインスタンス認識モデルアンサンブル

教師なしドメイン適応タスクのパフォーマンスを改善するために、線形アンサンブルベースの戦略、つまり平均化アンサンブルが提案されています。ただし、典型的な UDA タスクは、通常、変数の天候、ビュー、ラベルのないターゲットドメインの背景など、動的に変化する要因によって挑戦されます。以前のアンサンブル戦略のほとんどは、限られた機能表現とパフォーマンスのボトルネックに直面して、UDA の動的で制御不能な課題を無視しています。モデルを強化し、ドメイン間の適応性を高め、アンサンブルモデルを展開する際の計算コストを削減するために、異なるインスタンスに応じて複数の UDA コンポーネントモデルを適応的に融合し、これらのコンポーネントを蒸留するインスタンス対応モデルアンサンブル (IMED) という新しいフレームワークを提案します。小型モデルに。 IMED の核となる考え方は、動的なインスタンスを意識したアンサンブル戦略であり、各インスタンスについて、複数のコンポーネントモデルの抽出された特徴と予測されたラベルを融合する非線形融合サブネットワークが学習されます。非線形融合法は、アンサンブルモデルが動的に変化する因子を処理するのに役立ちます。さまざまな変化要因への適応性に優れた大容量のアンサンブルモデルを学習した後、アンサンブル教師モデルを活用して、知識の蒸留によるコンパクトな生徒モデルの学習を導きます。さらに、UDA に対する IMED の有効性の理論的分析を提供します。 Office 31、Office Home、VisDA 2017 などのさまざまな UDA ベンチマークデータセットで実施された広範な実験では、IMED に基づくモデルが最先端の方法よりも同等の計算コストで優れていることが示されています。

The linear ensemble based strategy, i.e., averaging ensemble, has been proposed to improve the performance in unsupervised domain adaptation tasks. However, a typical UDA task is usually challenged by dynamically changing factors, such as variable weather, views, and background in the unlabeled target domain. Most previous ensemble strategies ignore UDA's dynamic and uncontrollable challenge, facing limited feature representations and performance bottlenecks. To enhance the model, adaptability between domains and reduce the computational cost when deploying the ensemble model, we propose a novel framework, namely Instance aware Model Ensemble With Distillation, IMED, which fuses multiple UDA component models adaptively according to different instances and distills these components into a small model. The core idea of IMED is a dynamic instance aware ensemble strategy, where for each instance, a nonlinear fusion subnetwork is learned that fuses the extracted features and predicted labels of multiple component models. The nonlinear fusion method can help the ensemble model handle dynamically changing factors. After learning a large capacity ensemble model with good adaptability to different changing factors, we leverage the ensemble teacher model to guide the learning of a compact student model by knowledge distillation. Furthermore, we provide the theoretical analysis of the validity of IMED for UDA. Extensive experiments conducted on various UDA benchmark datasets, e.g., Office 31, Office Home, and VisDA 2017, show the superiority of the model based on IMED to the state of the art methods under the comparable computation cost.

updated: Tue Nov 15 2022 12:53:23 GMT+0000 (UTC)

published: Tue Nov 15 2022 12:53:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト