On the Universality of Rank Distributions of Website Popularity
Title: On the Universality of Rank Distributions of Website Popularity
Research Question: Is the rank distribution of websites Zipf-like, and if so, what are the conditions under which the "true" exponent can be obtained?
Methodology: The researchers analyzed long-term statistics of queries to websites using logs collected on several web caches in Russian academic networks. They studied website statistics, believing them to be more stable than web-document statistics. They addressed the following questions:
1. Is the rank distribution of websites Zipf-like? 2. If yes, what are the conditions under which the "true" exponent can be obtained? 3. Does the exponent depend on the duration of the observation? 4. Or on the geographical position of the observer? 5. And does the exponent vary with time, as the Internet develops?
They found that the statistics became stable when the number of queries for the given statistics exceeded 105. This simple criterion could be used to estimate the critical window for the rank interval where the distribution was stable and the power law could be observed.
They found that the statistics were independent of the geographical location of the cache server (observer) collecting the data, at least for the Russian scientific networks studied. They also found that the distribution was independent of the different years of data collection and was therefore stable over Internet history and development.
Results: The researchers proposed a modification of the Zipf-like law with two additional parameters and explained its possible meaning. They found that if they fit the equation of the modified law to the data, the website popularity distribution became quite stable. The value of the exponent α was 1.02 ± 0.05 for all datasets studied in this paper. They verified that the same modification worked also perfectly for the web-document ranked distribution.
Implications: The researchers concluded that website popularity follows the Zipf law, which suggests that the distribution of website ranks is stable and follows a power-law pattern. This finding has implications for understanding the structure and dynamics of the World Wide Web and could potentially influence the design and optimization of search engines and other web-based systems.
Link to Article: https://arxiv.org/abs/0404010v1 Authors: arXiv ID: 0404010v1