Performance Evaluation of LSTM and RNN Models in the Detection of Email Spam Messages

##plugins.themes.bootstrap3.article.main##

  •   Elijah John-Africa

  •   Victor T. Emmah

Abstract

Email spam is an unwanted bulk message that is sent to a recipient’s email address without explicit consent from the recipient. This is usually considered a means of advertising and maximizing profit, especially with the increase in the usage of the internet for social networking, but can also be very frustrating and annoying to the recipients of these messages. Recent research has shown that about 14.7 billion spam messages are sent out every single day of which more than 45% of these messages are promotional sales content that the recipient did not specifically opt-in. This has gotten the attention of many researchers in the area of natural language processing. In this paper, we used the Long Short-Time Memory (LSTM) for classification tasks between spam and Ham messages. The performance of LSTM is compared with that of a Recurrent Neural Network( RNN) which can also be used for a classification task of this nature but suffers from short-time memory and tends to leave out important information from earlier time steps to later ones in terms of prediction. The evaluation of the result shows that LSTM achieved 97% accuracy with both Adams and RMSprop optimizers compared to RNN with an accuracy of 94% with RMSprop and 87% accuracy with Adams optimizer.


Keywords: Adams, LSTM, RMSprop, RNN, unsupervised learning

References

Olah C. Understanding LSTM Networks-colah’s blog. [Internet] [cited 2020 June 22] Github.io. Available from: https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Hartley A. Spam gets 1 response per 12,500,000 emails. [Internet] [cited 2020 June 22] TechRadar. Available from: https://www.techradar.com/news/internet/computing/spam-gets-1-response-per-12-500-000-emails-483381

Alsmadi I, Alhami I. Clustering and classification of email contents. Journal of King Saud University-Computer and Information Sciences. 2015; 27(1): 46-57.

Banday MT, Jan TR. Effectiveness and limitations of statistical spam filters. arXiv preprint arXiv:0910.2540. 2009 Oct 14.

Sherman R. Financial Losses from Phishing. [Internet] [cited 2022 May 14] Available from: https://resources.infosecinstitute.com/topic/financial-losses-from-phishing/

Plumer B. The Economics of Internet Spam. Washington Post. [Internet] [cited 2020 December 1] Available from: https://www.washingtonpost.com/news/wonk/wp/2012/08/10/the-economics-of-spam/

Ukai Y, Takemura T. Spam mails impede economic growth. The Review of Socionetwork Strategies. 2007; 1(1): 14-22.

Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. 2014 Apr 8.

Vinitha V, Renuka DK. Feature selection techniques for email spam classification: a survey. International Conference on Artificial Intelligence, Smart Grid and Smart City Applications. 2019: 925-935.

Vinitha V.S, Renuka D.K. Performance Analysis of E-Mail Spam Classification using different Machine Learning Techniques. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE). 2019: 1-5.

Gupta M, Bakliwal A, Agarwal S, Mehndiratta P. A comparative study of spam SMS detection using machine learning classifiers. 2018 Eleventh International Conference on Contemporary Computing (IC3). 2018: 1-7.

Bassiouni M, Ali M, El-Dahshan EA. Ham and spam e-mails classification using machine learning techniques. Journal of Applied Security Research. 2018; 13(3): 315-31.

Enron spam corpu dataset, SpmAssassin. [Internet] Available from: https://www.kaggle.com/beatoa/spamassassin-public-corpus

UCI machine learning repository Spambase dataset. [Internet] Available from: https://archive.ics.uci.edu/ml/machine-learning-databases/00228/

##plugins.themes.bootstrap3.article.details##

How to Cite
John-Africa , E., & Emmah, V. T. (2022). Performance Evaluation of LSTM and RNN Models in the Detection of Email Spam Messages. European Journal of Information Technologies and Computer Science, 2(6), 24–30. https://doi.org/10.24018/compute.2022.2.6.80