Bin Bloom Filter Using Heuristic Optimization Techniques for Spam Detection

Bloom filter is a probabilistic and memory efficient
data structure designed to answer rapidly whether an element is
present in a set. It tells that the element is definitely not in the set but
its presence is with certain probability. The trade-off to use Bloom
filter is a certain configurable risk of false positives. The odds of a
false positive can be made very low if the number of hash function is
sufficiently large. For spam detection, weight is attached to each set
of elements. The spam weight for a word is a measure used to rate the
e-mail. Each word is assigned to a Bloom filter based on its weight.
The proposed work introduces an enhanced concept in Bloom filter
called Bin Bloom Filter (BBF). The performance of BBF over
conventional Bloom filter is evaluated under various optimization
techniques. Real time data set and synthetic data sets are used for
experimental analysis and the results are demonstrated for bin sizes 4,
5, 6 and 7. Finally analyzing the results, it is found that the BBF
which uses heuristic techniques performs better than the traditional
Bloom filter in spam detection.





References:
[1] B.H. Bloom, "Space/time tradeoffs in hash coding with allowable
errors,” Commun. ACM., vol. 13, no. 7, pp. 422–426, July, 1970.
[2] M. Abdoh, M. Musa and N. Salman, "Detecting Spam by Weighting
Message Words,” J. Arts Sci., vol. 1, no. 1, pp. 1-14, Aug. 2009.
[3] K. Xie, Y. Min, D. Zhang, G. Xie and J. Wen, "Basket Bloom Filters for
Membership Queries,” in Proc. IEEE Tencon’05, Melbourne, Qld, pp. 1-
6, 2005.
[4] D.E. Goldberg, Genetic Algorithms in Search, Optimization and
Machine Learning, Addison-Wesley, Boston, 2009.
[5] L.N. De Castro and F.J. Von Zuben, "Learning and Optimization using
the Clonal Selection Principle,” IEEE Trans. Evol. Comput., vol. 6, no.
3, pp. 239-251, Aug. 2002.
[6] J. Timmis and L.N. de Castro, Artificial Immune Systems: A New
Computational Intelligence Approach, Springer, London, 2002.
[7] N. Arulanand, P. Swathy Priyadharsini and S. Subramanian "Artificial
Immune System for Bloom filter Optimization,” Int. J. Computer App.,
vol. 41, no. 8, pp. 26-32, March 2012.
[8] J. Kennedy and R.C. Eberhart, "Particle Swarm Optimization,” in Proc.
IEEE Int. Confer. Neural. Networks., Perth, WA, Australia, pp. 1942-
1948, 1995
[9] R.C. Eberhart and Y. Shi, "Particle swarm optimization, developments,
applications and resources”, in. Proc Cong. Evol. Comput., Seoul,
Korea. Piscataway, pp.445-457, 2001.
[10] N. Arulanand, S. Subramanian and K. Premalatha "Optimized Bin
Bloom Filter for Spam filtering using Particle Swarm Optimization,”
European J. Scientific Research, vol. 68, no. 2, pp. 199-213, July 2012.
[11] M. Clerc and J. Kennedy, "The particle swarm: explosion, stability, and
convergence in a multi-dimensional complex space,” IEEE Trans. Evol.
Comput., vol. 6, pp. 58-73, Feb. 2002.
[12] N. Arulanand, S. Subramanian and K. Premalatha "An Enhanced
Cuckoo Search for Optimization of Bloom Filter in Spam Filtering,”
Global J. comp. Scie. Tech., vol. 12, no. 1, Jan. 2012
[13] N. Arulanand , S. Subramanian and K. Premalatha "A Comparison study
of cuckoo-bat search for Optimization of Bloom Filter in Spam
Filtering,” Int. J. Bio-Inspired Comput., vol. 4, no. 2, pp.89-99, June
2012.
[14] X.S. Yang, and S. Deb, "Engineering optimisation by Cuckoo search”
Int. J. Math. Modeil. Numer. optim., vol. 1, no. 4, pp. 330-343, Dec.
2010.
[15] C. Moskat, and M. Honza "European Cuckoo Cuculus Canorus
Parasitism and Host's Rejection Behaviour in a Heavily Parasitized
Great Reed Warbler Acrocephalus Arundinaceus Population,” Int. J.
Avian. Scie., vol. 144, no. 4, pp. 614-622, Sep. 2002
[16] A. Moksnes and E. Roskaft "Egg-Morphs and Host Preference in the
Common Cuckoo (Cuculus Canorus): An Analysis of Cuckoo and Host
Eggs form European Museums and Collections,” J. Zool., vol. 236, no.
4, pp. 625-648, Mar. 1995.
[17] X.S. Yang, "A New Metaheuristic Bat-Inspired Algorithm”, Nature
Inspired Cooperative Strategies for Optimization (NISCO 2010), Studies
in Computational Intelligence, Springer Berlin, Springer, vol. 284,
pp.65-74, April, 2010.