Abstract: Image spam is a kind of email spam where the spam
text is embedded with an image. It is a new spamming technique
being used by spammers to send their messages to bulk of internet
users. Spam email has become a big problem in the lives of internet
users, causing time consumption and economic losses. The main
objective of this paper is to detect the image spam by using histogram
properties of an image. Though there are many techniques to
automatically detect and avoid this problem, spammers employing
new tricks to bypass those techniques, as a result those techniques are
inefficient to detect the spam mails. In this paper we have proposed a
new method to detect the image spam. Here the image features are
extracted by using RGB histogram, HSV histogram and combination
of both RGB and HSV histogram. Based on the optimized image
feature set classification is done by using k- Nearest Neighbor(k-NN)
algorithm. Experimental result shows that our method has achieved
better accuracy. From the result it is known that combination of RGB
and HSV histogram with k-NN algorithm gives the best accuracy in
spam detection.
Abstract: Email has become a fast and cheap means of online
communication. The main threat to email is Unsolicited Bulk Email
(UBE), commonly called spam email. The current work aims at
identification of unigrams in more than 2700 UBE that advertise
body-enhancement drugs. The identification is based on the
requirement that the unigram is neither present in dictionary, nor is a
slang term. The motives of the paper are many fold. This is an
attempt to analyze spamming behaviour and employment of wordmutation
technique. On the side-lines of the paper, we have
attempted to better understand the spam, the slang and their interplay.
The problem has been addressed by employing Tokenization
technique and Unigram BOW model. We found that the non-lexicon
words constitute nearly 66% of total number of lexis of corpus
whereas non-slang words constitute nearly 2.4% of non-lexicon
words. Further, non-lexicon non-slang unigrams composed of 2
lexicon words, form more than 71% of the total number of such
unigrams. To the best of our knowledge, this is the first attempt to
analyze usage of non-lexicon non-slang unigrams in any kind of
UBE.
Abstract: In recent times, the problem of Unsolicited Bulk
Email (UBE) or commonly known as Spam Email, has increased at a
tremendous growth rate. We present an analysis of survey based on
classifications of UBE in various research works. There are many
research instances for classification between spam and non-spam
emails but very few research instances are available for classification
of spam emails, per se. This paper does not intend to assert some
UBE classification to be better than the others nor does it propose
any new classification but it bemoans the lack of harmony on number
and definition of categories proposed by different researchers. The
paper also elaborates on factors like intent of spammer, content of
UBE and ambiguity in different categories as proposed in related
research works of classifications of UBE.