Can Machine Learning Help Filter Spam Email?
Title: Can Machine Learning Help Filter Spam Email?
Research Question: Can we improve the accuracy of spam email filters by using machine learning algorithms?
Methodology: The researchers used a set of variants of the AdaBoost algorithm with confidence–rated predictions. These variants differed in the complexity of the base learners considered. They applied these algorithms to two main corpora: PU1 and PU2. The PU1 corpus is a publicly available collection of email messages, and the PU2 corpus is a collection of email messages from a commercial email provider.
Results: The study found that the AdaBoost algorithm with confidence–rated predictions performed well on both corpora. The algorithm was able to achieve very high levels of the F1 measure, which is a common metric used in information retrieval to evaluate the performance of a search engine. The researchers also found that increasing the complexity of the base learners allowed them to create more accurate "high-precision" classifiers. This is particularly important in the context of spam email filtering, as misclassification costs can be significant.
Implications: The results of this study suggest that machine learning algorithms, specifically the AdaBoost algorithm with confidence–rated predictions, can be effectively used to improve the accuracy of spam email filters. Furthermore, the study highlights the importance of using more complex base learners to achieve higher precision in classifying spam email. This could have practical implications for the development of more effective spam email filters, which would benefit users by reducing the amount of unwanted email they receive.
Link to Article: https://arxiv.org/abs/0109015v1 Authors: arXiv ID: 0109015v1