Information Distance in Multiples

Paul M. B. Vitanyi

倍数の情報距離

情報距離とは、パターン認識、データマイニング、系統発生、クラスタリング、および分類で使用される、圧縮に基づくパラメーターのない類似性の尺度です。情報距離の概念は、ペアから倍数（有限リスト）に拡張されています。最大オーバーラップ、メトリック、普遍性、最小オーバーラップ、加算性、および正規化された情報距離を倍数で研究します。コルモゴロフの複雑さの理論的概念を使用します。これは、実際の圧縮プログラムを使用して、実際の目的のために、関係するファイルの圧縮バージョンの長さで概算されます。インデックス用語-情報距離、倍数、パターン認識、データマイニング、類似性、コルモゴロフの複雑さ

Information distance is a parameter-free similarity measure based on compression, used in pattern recognition, data mining, phylogeny, clustering, and classification. The notion of information distance is extended from pairs to multiples (finite lists). We study maximal overlap, metricity, universality, minimal overlap, additivity, and normalized information distance in multiples. We use the theoretical notion of Kolmogorov complexity which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. Index Terms-- Information distance, multiples, pattern recognition, data mining, similarity, Kolmogorov complexity

updated: Wed May 20 2009 16:37:16 GMT+0000 (UTC)

published: Wed May 20 2009 16:37:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト