Abstract: Spam mails are unwanted mails sent to large number
of users. Spam mails not only consume the network resources, but
cause security threats as well. This paper proposes an efficient
technique to detect, and to prevent spam mail in the sender side rather
than the receiver side. This technique is based on a counter set on the
sender server. When a mail is transmitted to the server, the mail server
checks the number of the recipients based on its counter policy. The
counter policy performed by the mail server is based on some
pre-defined criteria. When the number of recipients exceeds the
counter policy, the mail server discontinues the rest of the process, and
sends a failure mail to sender of the mail; otherwise the mail is
transmitted through the network. By using this technique, the usage of
network resources such as bandwidth, and memory is preserved. The
simulation results in real network show that when the counter is set on
the sender side, the time required for spam mail detection is 100 times
faster than the time the counter is set on the receiver side, and the
network resources are preserved largely compared with other
anti-spam mail techniques in the receiver side.
Abstract: The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.