An Efficient Spam Mail Detection by Counter Technique

Spam mails are unwanted mails sent to large number of users. Spam mails not only consume the network resources, but cause security threats as well. This paper proposes an efficient technique to detect, and to prevent spam mail in the sender side rather than the receiver side. This technique is based on a counter set on the sender server. When a mail is transmitted to the server, the mail server checks the number of the recipients based on its counter policy. The counter policy performed by the mail server is based on some pre-defined criteria. When the number of recipients exceeds the counter policy, the mail server discontinues the rest of the process, and sends a failure mail to sender of the mail; otherwise the mail is transmitted through the network. By using this technique, the usage of network resources such as bandwidth, and memory is preserved. The simulation results in real network show that when the counter is set on the sender side, the time required for spam mail detection is 100 times faster than the time the counter is set on the receiver side, and the network resources are preserved largely compared with other anti-spam mail techniques in the receiver side.

Adaptive Naïve Bayesian Anti-Spam Engine

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.