Explaining image classifiers by removing input features using generative models

Chirag Agarwal; Anh Nguyen

生成モデルを使用して入力特徴を削除することによる画像分類子の説明

摂動ベースの説明方法では、画像分類器の出力への入力特徴の寄与を、ぼかし、ノイズの追加、グレーアウトなどによってヒューリスティックに削除することで測定することがよくあります。これにより、非現実的なサンプル外が生成されることがよくあります。代わりに、生成インペインターを3つの代表的な帰属方法に統合して、入力機能を削除することを提案します。私たちが提案した変更は、（1）真のデータ分布の下でより妥当な反事実サンプルを生成する3つの方法すべてを改善しました。（2）オブジェクトのローカリゼーション、削除、および顕著性のメトリックの3つのメトリックに従ってより正確になります。（3）ハイパーパラメータの変更に対してより堅牢です。私たちの調査結果は、ImageNetデータセットとPlaces365データセットの両方、および分類子とインペインターの2つの異なるペアで一貫していました。

Perturbation-based explanation methods often measure the contribution of an input feature to an image classifier's outputs by heuristically removing it via e.g. blurring, adding noise, or graying out, which often produce unrealistic, out-of-samples. Instead, we propose to integrate a generative inpainter into three representative attribution methods to remove an input feature. Our proposed change improved all three methods in (1) generating more plausible counterfactual samples under the true data distribution; (2) being more accurate according to three metrics: object localization, deletion, and saliency metrics; and (3) being more robust to hyperparameter changes. Our findings were consistent across both ImageNet and Places365 datasets and two different pairs of classifiers and inpainters.

updated: Tue Oct 06 2020 16:08:42 GMT+0000 (UTC)

published: Wed Oct 09 2019 21:08:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト