Kolmogorov Complexity-Based Occam's Razor: A Representation-Independent Formulation
Title: Kolmogorov Complexity-Based Occam's Razor: A Representation-Independent Formulation
Abstract: Occam's razor is a fundamental principle in machine learning and data science, stating that the simplest explanation is most likely the correct one. However, previous formulations of Occam's razor, such as the VC dimension-based and length-based versions, have limitations and may not always provide the best sample complexity. This study presents a new representation-independent formulation of Occam's razor based on Kolmogorov complexity, a measure of the minimum length of a computer program that can generate a given string.
The main contributions of this research are:
1. A sharper reverse of Occam's razor theorem: The study weakens the assumptions made in previous work and extends the reverse to superpolynomial running times. 2. Better sample complexity than both length-based and VC-based versions: The new formulation allows for better sample complexity in many applications, making it more practical for use in machine learning and data science. 3. Representation-independence: The formulation is not dependent on the accidents of "representation format," making it more universally applicable and easier to use.
The research demonstrates that the length-based Occam's razor, while convenient, can be a specific computable approximation to the Kolmogorov complexity-based Occam's razor. As an example, the study shows that a standard trivial learning algorithm for monomials often has a better sample complexity than a more sophisticated algorithm, contradicting the common belief that the latter is better.
Overall, the Kolmogorov complexity-based Occam's razor presented in this research provides a more practical and universally applicable formulation of the principle, making it a valuable tool in machine learning and data science.
Link to Article: https://arxiv.org/abs/0201005v1 Authors: arXiv ID: 0201005v1