Performance Evaluation of LSTM and RNN Models in the Detection of Email Spam Messages
##plugins.themes.bootstrap3.article.main##
Email spam is an unwanted bulk message that is sent to a recipient’s email address without explicit consent from the recipient. This is usually considered a means of advertising and maximizing profit, especially with the increase in the usage of the internet for social networking, but can also be very frustrating and annoying to the recipients of these messages. Recent research has shown that about 14.7 billion spam messages are sent out every single day of which more than 45% of these messages are promotional sales content that the recipient did not specifically opt-in. This has gotten the attention of many researchers in the area of natural language processing. In this paper, we used the Long Short-Time Memory (LSTM) for classification tasks between spam and Ham messages. The performance of LSTM is compared with that of a Recurrent Neural Network( RNN) which can also be used for a classification task of this nature but suffers from short-time memory and tends to leave out important information from earlier time steps to later ones in terms of prediction. The evaluation of the result shows that LSTM achieved 97% accuracy with both Adams and RMSprop optimizers compared to RNN with an accuracy of 94% with RMSprop and 87% accuracy with Adams optimizer.
References
-
Olah C. Understanding LSTM Networks-colah’s blog. [Internet] [cited 2020 June 22] Github.io. Available from: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Google Scholar
1
-
Hartley A. Spam gets 1 response per 12,500,000 emails. [Internet] [cited 2020 June 22] TechRadar. Available from: https://www.techradar.com/news/internet/computing/spam-gets-1-response-per-12-500-000-emails-483381
Google Scholar
2
-
Alsmadi I, Alhami I. Clustering and classification of email contents. Journal of King Saud University-Computer and Information Sciences. 2015; 27(1): 46-57.
Google Scholar
3
-
Banday MT, Jan TR. Effectiveness and limitations of statistical spam filters. arXiv preprint arXiv:0910.2540. 2009 Oct 14.
Google Scholar
4
-
Sherman R. Financial Losses from Phishing. [Internet] [cited 2022 May 14] Available from: https://resources.infosecinstitute.com/topic/financial-losses-from-phishing/
Google Scholar
5
-
Plumer B. The Economics of Internet Spam. Washington Post. [Internet] [cited 2020 December 1] Available from: https://www.washingtonpost.com/news/wonk/wp/2012/08/10/the-economics-of-spam/
Google Scholar
6
-
Ukai Y, Takemura T. Spam mails impede economic growth. The Review of Socionetwork Strategies. 2007; 1(1): 14-22.
Google Scholar
7
-
Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. 2014 Apr 8.
Google Scholar
8
-
Vinitha V, Renuka DK. Feature selection techniques for email spam classification: a survey. International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. 2019: 925-935.
Google Scholar
9
-
Vinitha V.S, Renuka D.K. Performance Analysis of E-Mail Spam Classification using different Machine Learning Techniques. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE). 2019: 1-5.
Google Scholar
10
-
Gupta M, Bakliwal A, Agarwal S, Mehndiratta P. A comparative study of spam SMS detection using machine learning classifiers. 2018 Eleventh International Conference on Contemporary Computing (IC3). 2018: 1-7.
Google Scholar
11
-
Bassiouni M, Ali M, El-Dahshan EA. Ham and spam e-mails classification using machine learning techniques. Journal of Applied Security Research. 2018; 13(3): 315-31.
Google Scholar
12
-
Enron spam corpu dataset, SpmAssassin. [Internet] Available from: https://www.kaggle.com/beatoa/spamassassin-public-corpus
Google Scholar
13
-
UCI machine learning repository Spambase dataset. [Internet] Available from: https://archive.ics.uci.edu/ml/machine-learning-databases/00228/
Google Scholar
14