Abstract: Classification is an important data mining technique
and could be used as data filtering in artificial intelligence. The
broad application of classification for all kind of data leads to be
used in nearly every field of our modern life. Classification helps us
to put together different items according to the feature items decided
as interesting and useful. In this paper, we compare two
classification methods Naïve Bayes and ADTree use to detect spam
e-mail. This choice is motivated by the fact that Naive Bayes
algorithm is based on probability calculus while ADTree algorithm is
based on decision tree. The parameter settings of the above
classifiers use the maximization of true positive rate and
minimization of false positive rate. The experiment results present
classification accuracy and cost analysis in view of optimal classifier
choice for Spam Detection. It is point out the number of attributes to
obtain a tradeoff between number of them and the classification
accuracy.
Abstract: With the increasing number of people reviewing
products online in recent years, opinion sharing websites has become
the most important source of customers’ opinions. Unfortunately,
spammers generate and post fake reviews in order to promote or
demote brands and mislead potential customers. These are notably
destructive not only for potential customers, but also for business
holders and manufacturers. However, research in this area is not
adequate, and many critical problems related to spam detection have
not been solved to date. To provide green researchers in the domain
with a great aid, in this paper, we have attempted to create a highquality
framework to make a clear vision on review spam-detection
methods. In addition, this report contains a comprehensive collection
of detection metrics used in proposed spam-detection approaches.
These metrics are extremely applicable for developing novel
detection methods.
Abstract: Image spam is a kind of email spam where the spam
text is embedded with an image. It is a new spamming technique
being used by spammers to send their messages to bulk of internet
users. Spam email has become a big problem in the lives of internet
users, causing time consumption and economic losses. The main
objective of this paper is to detect the image spam by using histogram
properties of an image. Though there are many techniques to
automatically detect and avoid this problem, spammers employing
new tricks to bypass those techniques, as a result those techniques are
inefficient to detect the spam mails. In this paper we have proposed a
new method to detect the image spam. Here the image features are
extracted by using RGB histogram, HSV histogram and combination
of both RGB and HSV histogram. Based on the optimized image
feature set classification is done by using k- Nearest Neighbor(k-NN)
algorithm. Experimental result shows that our method has achieved
better accuracy. From the result it is known that combination of RGB
and HSV histogram with k-NN algorithm gives the best accuracy in
spam detection.
Abstract: Bloom filter is a probabilistic and memory efficient
data structure designed to answer rapidly whether an element is
present in a set. It tells that the element is definitely not in the set but
its presence is with certain probability. The trade-off to use Bloom
filter is a certain configurable risk of false positives. The odds of a
false positive can be made very low if the number of hash function is
sufficiently large. For spam detection, weight is attached to each set
of elements. The spam weight for a word is a measure used to rate the
e-mail. Each word is assigned to a Bloom filter based on its weight.
The proposed work introduces an enhanced concept in Bloom filter
called Bin Bloom Filter (BBF). The performance of BBF over
conventional Bloom filter is evaluated under various optimization
techniques. Real time data set and synthetic data sets are used for
experimental analysis and the results are demonstrated for bin sizes 4,
5, 6 and 7. Finally analyzing the results, it is found that the BBF
which uses heuristic techniques performs better than the traditional
Bloom filter in spam detection.