Data Models for Dataset Drift Controls in Machine Learning With Images

Luis Oala; Marco Aversa; Gabriel Nobis; Kurt Willis; Yoan Neuenschwander; Michèle Buck; Christian Matek; Jerome Extermann; Enrico Pomarico; Wojciech Samek; Roderick Murray-Smith; Christoph Clausen; Bruno Sanguinetti

画像を使用した機械学習におけるデータセットドリフトコントロールのデータモデル

機械学習の研究では、カメラ画像が広く使われています。また、医療や環境調査にまたがる重要なサービスの提供においても中心的な役割を果たしています。ただし、これらのドメインでの機械学習モデルの適用は、堅牢性の懸念から制限されています。主な障害モードは、トレーニングデータと展開データの違いによるパフォーマンスの低下です。このようなデータセットのドリフトに対する機械学習モデルの堅牢性を前向きに検証する方法はありますが、既存のアプローチでは、主な関心対象であるデータの明示的なモデルを考慮していません。これにより、物理的に忠実なドリフトテストケースを作成したり、機械学習モデルをデプロイする際に避けるべきデータモデルの仕様を提供したりすることが困難になります。この研究では、機械学習の堅牢性の検証と物理光学を組み合わせることで、これらの欠点を克服する方法を示します。生のセンサーデータと微分可能なデータモデルが、画像データセットのドリフトに関連するパフォーマンスリスクを制御する上で果たすことができる役割を調べます。調査結果は 3 つのアプリケーションに要約されます。まず、ドリフト合成により、物理的に忠実なドリフトテストケースの制御された生成が可能になります。ここに示した実験では、モデルのパフォーマンスの平均的な低下は、事後拡張テストよりも 10 倍から 4 倍少ないことが示されています。 2 つ目は、タスクモデルとデータモデルの間の勾配接続により、ドリフトフォレンジックを使用して、機械学習モデルのデプロイ中に回避する必要があるパフォーマンスに敏感なデータモデルを指定できるようになります。第三に、ドリフト調整により、ドリフトに直面した場合の処理調整の可能性が開かれます。これにより、検証精度で最大 20% のマージンで分類子トレーニングの高速化と安定化を実現できます。オープンコードとデータセットにアクセスするためのガイドは、https://github.com/aiaudit-org/raw2logit で入手できます。

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these domains has been limited because of robustness concerns. A primary failure mode are performance drops due to differences between the training and deployment data. While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of the primary object of interest: the data. This makes it difficult to create physically faithful drift test cases or to provide specifications of data models that should be avoided when deploying a machine learning model. In this study, we demonstrate how these shortcomings can be overcome by pairing machine learning robustness validation with physical optics. We examine the role raw sensor data and differentiable data models can play in controlling performance risks related to image dataset drift. The findings are distilled into three applications. First, drift synthesis enables the controlled generation of physically faithful drift test cases. The experiments presented here show that the average decrease in model performance is ten to four times less severe than under post-hoc augmentation testing. Second, the gradient connection between task and data models allows for drift forensics that can be used to specify performance-sensitive data models which should be avoided during deployment of a machine learning model. Third, drift adjustment opens up the possibility for processing adjustments in the face of drift. This can lead to speed up and stabilization of classifier training at a margin of up to 20% in validation accuracy. A guide to access the open code and datasets is available at https://github.com/aiaudit-org/raw2logit.

updated: Fri Nov 04 2022 16:50:10 GMT+0000 (UTC)

published: Fri Nov 04 2022 16:50:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト