Automated Classification of Web Sites

From Simple Sci Wiki
Revision as of 02:00, 24 December 2023 by SatoshiNakamoto (talk | contribs) (Created page with "Title: Automated Classification of Web Sites Main Research Question: How can we develop an automated system for classifying web sites into industry categories? Methodology: The study used a combination of text analysis techniques and machine learning algorithms to classify web sites. The text features used included HTML metatags, which were extracted using a targeted spidering approach that also crawled specific semantic hyperlinks. The system was trained using differe...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: Automated Classification of Web Sites

Main Research Question: How can we develop an automated system for classifying web sites into industry categories?

Methodology: The study used a combination of text analysis techniques and machine learning algorithms to classify web sites. The text features used included HTML metatags, which were extracted using a targeted spidering approach that also crawled specific semantic hyperlinks. The system was trained using different combinations of text features and training data to determine the most effective approach.

Results: The study found that HTML metatags were a good source of text features for classifying web sites, but were not in wide use despite their role in search engine rankings. The system was able to classify web sites into industry categories with high accuracy.

Implications: The research suggests that automated classification systems can be highly effective for organizing and understanding web content. The approach used in this study can serve as a basis for a generalized framework for automated metadata creation, which can greatly enhance the searchability and usability of web content.

Link to Article: https://arxiv.org/abs/0102002v1 Authors: arXiv ID: 0102002v1