A Study of Unsupervised Evaluation Metrics for Practical and Automatic Domain Adaptation

Minghao Chen; Zepeng Gao; Shuai Zhao; Qibo Qiu; Wenxiao Wang; Binbin Lin; Xiaofei He

実用的かつ自動ドメイン適応のための教師なし評価指標の研究

教師なしドメイン適応 (UDA) メソッドは、ラベルなしでモデルをターゲットドメインに転送することを容易にします。ただし、これらの方法では、ハイパーパラメーターの調整とモデルの選択のためにラベル付きのターゲット検証セットが必要です。このペーパーでは、ターゲットの検証ラベルにアクセスせずに、転送されたモデルの品質を評価できる評価指標を見つけることを目的としています。モデル予測の相互情報量に基づくメトリックから始めます。実証分析を通じて、このメトリクスに関する 3 つの一般的な問題を特定します。 1) ソース構造が考慮されていません。２）簡単に攻撃されてしまう。 3) ソースフィーチャとターゲットフィーチャの過剰な位置合わせによって引き起こされるネガティブ転送を検出できません。最初の 2 つの問題に対処するために、ソースの精度をメトリクスに組み込み、トレーニング中に実行される新しい MLP 分類子を採用して、結果を大幅に改善しました。最後の問題に取り組むために、この強化されたメトリクスをデータ拡張と統合し、その結果、Augmentation Consistency Metric (ACM) と呼ばれる新しい教師なし UDA メトリクスが誕生しました。さらに、以前の実験設定の欠点を経験的に実証し、提案した指標の有効性を検証するために大規模な実験を実施します。さらに、メトリクスを使用して最適なハイパーパラメータセットを自動的に検索し、4 つの一般的なベンチマークにわたって手動で調整したセットと比較して優れたパフォーマンスを実現します。コードは間もなく利用可能になります。

Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels. However, these methods necessitate a labeled target validation set for hyper-parameter tuning and model selection. In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels. We begin with the metric based on mutual information of the model prediction. Through empirical analysis, we identify three prevalent issues with this metric: 1) It does not account for the source structure. 2) It can be easily attacked. 3) It fails to detect negative transfer caused by the over-alignment of source and target features. To address the first two issues, we incorporate source accuracy into the metric and employ a new MLP classifier that is held out during training, significantly improving the result. To tackle the final issue, we integrate this enhanced metric with data augmentation, resulting in a novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM). Additionally, we empirically demonstrate the shortcomings of previous experiment settings and conduct large-scale experiments to validate the effectiveness of our proposed metric. Furthermore, we employ our metric to automatically search for the optimal hyper-parameter set, achieving superior performance compared to manually tuned sets across four common benchmarks. Codes will be available soon.

updated: Mon Sep 18 2023 11:19:20 GMT+0000 (UTC)

published: Tue Aug 01 2023 05:01:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト