Robust Models Are More Interpretable Because Attributions Look Normal

Zifan Wang; Matt Fredrikson; Anupam Datta

アトリビューションは正常に見えるため、堅牢なモデルはより解釈しやすくなります

最近の研究では、画像分類に使用される敵対的にロバストなディープネットワークの方が解釈しやすいことがわかっています。それらの特徴の帰属はより鮮明になる傾向があり、画像のグラウンドトゥルースクラスに関連付けられたオブジェクトにより集中しています。データポイントの周りのモデルの入力勾配は、境界が滑らかな場合、境界の法線ベクトルとより密接に整列するため、滑らかな決定境界がこの解釈可能性の向上に重要な役割を果たすことを示します。したがって、堅牢なモデルの境界はより滑らかであるため、Integrated GradientsやDeepLiftなどの勾配ベースの帰属方法の結果は、近くの決定境界に関するより正確な情報を取得します。堅牢な解釈可能性のこの理解は、分類結果を説明するためにローカル決定境界の法線ベクトルに関する情報を集約する境界属性という2番目の貢献につながります。堅牢な解釈可能性を支える重要な要素を活用することにより、境界属性は、堅牢でないモデルでも、より鮮明で集中的な視覚的説明を生成することを示します。実装例はhttps://github.com/zifanw/boundaryにあります。

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model's input gradients around data points will more closely align with boundaries' normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient-based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision boundaries to explain a classification outcome. We show that by leveraging the key factors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations -- even on non-robust models. Any example implementation can be found at https://github.com/zifanw/boundary.

updated: Wed Oct 06 2021 02:21:53 GMT+0000 (UTC)

published: Sat Mar 20 2021 22:36:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト