DORA: Exploring Outlier Representations in Deep Neural Networks

Kirill Bykov; Mayukh Deb; Dennis Grinwald; Klaus-Robert Müller; Marina M. -C. Höhne

DORA: ディープニューラルネットワークにおける外れ値表現の探索

ディープニューラルネットワーク (DNN) は、内部表現内の複雑な抽象化を学習することに優れています。ただし、学習する概念は依然として不透明であり、モデルが意図せずに偽の相関を学習した場合に、この問題は特に深刻になります。この研究では、DNN の表現空間を分析するための初のデータ非依存フレームワークである DORA (Data-agnOstic Representation Analysis) を紹介します。私たちのフレームワークの中心となるのは、提案された Extreme-Activation (EA) 距離測定です。これは、最高レベルのアクティベーションを引き起こすデータポイント上のアクティベーションパターンを分析することによって、表現間の類似性を評価します。偽の相関は、透かしやアーティファクトなど、目的のタスクにとって異常なデータの特徴として現れることが多いため、ニューラル表現内の関係を分析することによって、そのようなアーティファクトの概念を検出できる内部表現を見つけることができることを実証します。私たちは EA メトリクスを定量的に検証し、制御されたシナリオと現実世界のアプリケーションの両方でその有効性を実証します。最後に、EA メトリクスを使用して外れ値として特定された表現が、望ましくない偽の概念に対応することが多いことを示すために、一般的なコンピュータービジョンモデルの実践的な例を示します。

Deep Neural Networks (DNNs) excel at learning complex abstractions within their internal representations. However, the concepts they learn remain opaque, a problem that becomes particularly acute when models unintentionally learn spurious correlations. In this work, we present DORA (Data-agnOstic Representation Analysis), the first data-agnostic framework for analyzing the representational space of DNNs. Central to our framework is the proposed Extreme-Activation (EA) distance measure, which assesses similarities between representations by analyzing their activation patterns on data points that cause the highest level of activation. As spurious correlations often manifest in features of data that are anomalous to the desired task, such as watermarks or artifacts, we demonstrate that internal representations capable of detecting such artifactual concepts can be found by analyzing relationships within neural representations. We validate the EA metric quantitatively, demonstrating its effectiveness both in controlled scenarios and real-world applications. Finally, we provide practical examples from popular Computer Vision models to illustrate that representations identified as outliers using the EA metric often correspond to undesired and spurious concepts.

updated: Mon Jul 10 2023 15:42:34 GMT+0000 (UTC)

published: Thu Jun 09 2022 14:25:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト