Coherent Keyphrase Extraction via Web Mining
Title: Coherent Keyphrase Extraction via Web Mining
Research Question: How can the coherence of automatically extracted keyphrases be improved using web mining techniques?
Methodology: The study used the Kea keyphrase extraction algorithm, which is a supervised learning algorithm that classifies candidate phrases as keyphrases or non-keyphrases. Four different sets of features were evaluated: two sets that have been used in the past and two new sets introduced in this paper, based on web mining techniques to measure the statistical association among candidate phrases.
Results: The experiments demonstrated that the new web mining features significantly improved the coherence of the extracted keyphrases. The enhancements were not domain-specific, meaning the algorithm generalized well when trained on one domain (computer science documents) and tested on another (physics documents).
Implications: The study's findings suggest that web mining techniques can be used to improve the coherence of automatically extracted keyphrases. This could have significant implications for information retrieval and text processing systems, as it makes it feasible to generate keyphrases for a large number of documents that do not have manually assigned keyphrases.
Link to Article: https://arxiv.org/abs/0308033v1 Authors: arXiv ID: 0308033v1