Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

Delyan Boychev

敵対的トレーニングによる解釈可能なコンピュータービジョンモデル: ロバスト性と解釈可能性の関係を明らかにする

最先端のディープニューラルネットワークの複雑さが絶え間なく増大するにつれ、その解釈可能性を維持することはますます困難な課題となっています。私たちの研究の目的は、敵対的攻撃に対して脆弱ではない堅牢なモデルを作成するために利用された敵対的トレーニングの効果を評価することです。コンピュータービジョンモデルをより解釈しやすくすることが示されています。モデルを現実世界に展開する場合、解釈可能性は堅牢性と同じくらい重要です。これら 2 つの問題間の相関関係を証明するために、局所特徴重要度法 (SHAP、統合勾配) と特徴視覚化技術 (表現反転、クラス固有の画像生成) を使用してモデルを徹底的に検査します。標準モデルは、堅牢なモデルと比較すると、敵対的な攻撃の影響を受けやすく、学習された表現は人間にとってあまり意味がありません。逆に、これらのモデルは、予測を裏付ける画像の特徴的な領域に焦点を当てています。さらに、ロバストモデルによって学習された特徴は、実際の特徴に近くなります。

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

updated: Sun Nov 19 2023 15:38:50 GMT+0000 (UTC)

published: Tue Jul 04 2023 13:51:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト