A Topological-Framework to Improve Analysis of Machine Learning Model Performance

Henry Kvinge; Colby Wight; Sarah Akers; Scott Howland; Woongjo Choi; Xiaolong Ma; Luke Gosink; Elizabeth Jurrus; Keerti Kappagantula; Tegan H. Emerson

機械学習モデルのパフォーマンスの分析を改善するためのトポロジーフレームワーク

機械学習モデルとそれらが評価されるデータセットの両方のサイズと複雑さが増すにつれて、モデルのパフォーマンスを理解するためにいくつかの要約統計量を使用する慣行はますます問題になっています。これは、データの特定のサブポピュレーションでのモデルの失敗を理解することが非常に重要である実際のシナリオで特に当てはまります。この論文では、データセットがモデルが動作する「空間」として扱われる機械学習モデルを評価するためのトポロジーフレームワークを提案します。これにより、グローバルレベル（テストセット全体）とローカルレベル（特定のサブポピュレーション）の両方でモデルのパフォーマンスに関する情報を整理するための原則的な方法が提供されます。最後に、トポロジデータ構造である前層について説明します。これは、異なるサブポピュレーション間のモデルパフォーマンスを保存および分析するための便利な方法を提供します。

As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic. This is particularly true in real-world scenarios where understanding model failure on certain subpopulations of the data is of critical importance. In this paper we propose a topological framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. This provides us with a principled way to organize information about model performance at both the global level (over the entire test set) and also the local level (on specific subpopulations). Finally, we describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.

updated: Fri Jul 09 2021 23:11:13 GMT+0000 (UTC)

published: Fri Jul 09 2021 23:11:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト