Measuring Effective Similarity: A Universal Metric Approach
Title: Measuring Effective Similarity: A Universal Metric Approach
Abstract: The research question at hand is how to measure similarity effectively between sequences, such as internet documents, different language text corpora, computer programs, or chain letters. The study proposes a new "normalized information distance" metric, based on the noncomputable notion of Kolmogorov complexity. This metric is shown to be universal, meaning it can discern all effective similarities. The metric is found to be a metric itself, taking values in the range [0, 1], and is thus aptly named the "similarity metric". The paper presents two applications in widely divergent areas: comparing whole mitochondrial genomes to infer evolutionary history and constructing a language tree for 52 different languages based on translated versions of the "Universal Declaration of Human Rights". The implications of this research are significant, as it provides a practical tool for measuring similarity and comparing sequences across various fields.
Main Research Question: How can we measure similarity effectively between sequences?
Methodology: The study proposes a new "normalized information distance" metric, which is based on the noncomputable notion of Kolmogorov complexity. This metric is shown to be universal, meaning it can discern all effective similarities.
Results: The "normalized information distance" metric is found to be a metric itself, taking values in the range [0, 1]. The paper presents two applications in widely divergent areas: comparing whole mitochondrial genomes and constructing a language tree for 52 different languages.
Implications: The research provides a practical tool for measuring similarity and comparing sequences across various fields. The universal nature of the "normalized information distance" metric means that it can be applied to a wide range of applications, making it a versatile and valuable tool in various fields.
Link to Article: https://arxiv.org/abs/0111054v2 Authors: arXiv ID: 0111054v2