Understanding the Structure of Content-Based Communities on the Web
Title: Understanding the Structure of Content-Based Communities on the Web
Abstract: This research aims to understand the structure of content-based communities on the Web by using a topic taxonomy and an automatic classifier. It measures the background distribution of broad topics on the Web and analyzes the capability of random walk algorithms to draw samples following such distributions. The study also estimates the topic mixing distance, which may explain why a global PageRank is still meaningful in the context of broad queries. The findings may prove valuable in the design of community-specific crawlers and link-based ranking systems.
Main Research Question: How can we understand the structure of content-based communities on the Web by using a topic taxonomy and an automatic classifier?
Methodology: The study uses a topic taxonomy, such as Yahoo! or the Open Directory, to characterize the structure of content-based clusters and communities. It employs an automatic classifier to measure the background distribution of broad topics on the Web. The research also uses random walk algorithms to analyze the capability of drawing samples following such distributions.
Results: The study finds that the structure of content-based communities on the Web can be better understood by using a topic taxonomy and an automatic classifier. It also estimates the topic mixing distance, which may explain why a global PageRank is still meaningful in the context of broad queries.
Implications: The findings of this research may have significant implications for the design of community-specific crawlers and link-based ranking systems. It may also help in improving our understanding of the Web's social network structure and its evolution.
Link to Article: https://arxiv.org/abs/0203024v1 Authors: arXiv ID: 0203024v1