Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification

Yongcan Yu; Lijun Sheng; Ran He; Jian Liang

画像分類における分布シフトに対するテスト時間の適応のベンチマーク

テスト時適応 (TTA) は、予測中にのみラベルのないサンプルを活用することでモデルの汎化パフォーマンスを向上させることを目的とした手法です。分布の変化に直面したときのニューラルネットワークシステムの堅牢性の必要性を考慮して、最近、数多くの TTA 手法が提案されています。ただし、これらの手法の評価は、さまざまな配布シフト、バックボーン、シナリオの設計など、さまざまな設定の下で行われることが多く、その有効性を検証するための一貫性のある公平なベンチマークが不足します。この問題に対処するために、広く使用されている 5 つの画像分類データセット (CIFAR-10-C、CIFAR-100-C、ImageNet-C、DomainNet、および Office-Home) 上で 13 の著名な TTA 手法とそのバリアントを体系的に評価するベンチマークを紹介します。これらの方法には、幅広い適応シナリオ (オンライン適応とオフライン適応、インスタンス適応とバッチ適応とドメイン適応など) が含まれます。さらに、さまざまな TTA 方式と多様なネットワークバックボーンとの互換性を調査します。このベンチマークを実装するために、私たちは PyTorch で統一フレームワークを開発しました。これにより、さまざまなデータセットやネットワークアーキテクチャにわたって TTA メソッドの一貫した評価と比較が可能になります。このベンチマークを確立することで、モデルの堅牢性と汎化パフォーマンスの向上における TTA 手法の有効性を評価および比較するための信頼できる手段を研究者や実践者に提供することを目指しています。私たちのコードは https://github.com/yuyongcan/Benchmark-TTA で入手できます。

Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.

updated: Thu Jul 06 2023 16:59:53 GMT+0000 (UTC)

published: Thu Jul 06 2023 16:59:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト