DORA: Exploring outlier representations in Deep Neural Networks

Kirill Bykov; Mayukh Deb; Dennis Grinwald; Klaus-Robert Müller; Marina M. -C. Höhne

DORA: ディープニューラルネットワークにおける外れ値表現の調査

ディープニューラルネットワーク (DNN) は、学習した表現から力を引き出します。ただし、複雑な抽象化の学習には非常に効果的ですが、トレーニングデータに固有の誤った相関関係により、悪意のある概念を学習する可能性があります。これまでのところ、トレーニング済みモデルでこのような人為的な動作を明らかにする既存の方法は、入力データ内のアーティファクトを見つけることに重点を置いており、これにはデータセットの可用性と人間による監視の両方が必要です。この論文では、DNN の表現空間を分析するための最初のデータに依存しないフレームワークである DORA (Data-agnostic Representation Analysis) を紹介します。データにアクセスせずにネットワーク自体の自己説明機能を利用する表現間の新しい距離測定を提案し、人間が定義した意味距離との整合性を定量的に検証します。さらに、このメトリックを異常な表現の検出に利用できることを示します。これは、望ましい意思決定ポリシーから逸脱した、意図しない偽の概念を学習するリスクを負う可能性があります。最後に、広く普及しているコンピュータービジョンモデルのアーティファクト表現を分析および識別することにより、DORA の実用性を実証します。

Deep Neural Networks (DNNs) draw their power from the representations they learn. However, while being incredibly effective in learning complex abstractions, they are susceptible to learning malicious concepts, due to the spurious correlations inherent in the training data. So far, existing methods for uncovering such artifactual behavior in trained models focus on finding artifacts in the input data, which requires both availability of a data set and human supervision. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first data-agnostic framework for the analysis of the representation space of DNNs. We propose a novel distance measure between representations that utilizes self-explaining capabilities within the network itself without access to any data and quantitatively validate its alignment with human-defined semantic distances. We further demonstrate that this metric could be utilized for the detection of anomalous representations, which may bear a risk of learning unintended spurious concepts deviating from the desired decision-making policy. Finally, we demonstrate the practical utility of DORA by analyzing and identifying artifactual representations in widely popular Computer Vision models.

updated: Fri Mar 10 2023 14:41:21 GMT+0000 (UTC)

published: Thu Jun 09 2022 14:25:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト