Website Popularity: An Analysis of Rank Distribution

From Simple Sci Wiki
Jump to navigation Jump to search

Title: Website Popularity: An Analysis of Rank Distribution

Research Question: Is the rank distribution of websites Zipf-like, and if so, what are the conditions under which the "true" exponent can be obtained?

Methodology: The study analyzed long-term statistics of queries to websites using logs collected on several web caches in Russian academic networks and on US IRCache caches. The sensitivity of the statistics to various parameters, such as the duration of data collection, geographical location of the cache server, and the year of data collection, was examined.

Results: The study found that the statistics were stable when the number of queries for a given statistic exceeded 105. It was also discovered that the distribution was independent of the geographical location of the cache server and the year of data collection. A two-parameter modification of the Zipf law was proposed, and it was found that the website popularity distribution became quite stable when the equation of the modified law was fit to the data. The value of the exponent α was found to be 1.02±0.05 for all datasets studied.

Implications: The study suggests that website popularity may be a universal property of the Internet, as the Zipf-like law was found to be suitable for the middle region of several orders of rank magnitude. The proposed modification of the Zipf law was verified to work perfectly for web-document ranked distribution as well.

Link to Article: https://arxiv.org/abs/0404010v2 Authors: arXiv ID: 0404010v2