Abstract: The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.
Abstract: As the disfunctions of the information society and
social development progress, intrusion problems such as malicious
replies, spam mail, private information leakage, phishing, and
pharming, and side effects such as the spread of unwholesome
information and privacy invasion are becoming serious social
problems. Illegal access to information is also becoming a problem as
the exchange and sharing of information increases on the basis of the
extension of the communication network. On the other hand, as the
communication network has been constructed as an international,
global system, the legal response against invasion and cyber-attack
from abroad is facing its limit. In addition, in an environment where
the important infrastructures are managed and controlled on the basis
of the information communication network, such problems pose a
threat to national security. Countermeasures to such threats are
developed and implemented on a yearly basis to protect the major
infrastructures of information communication. As a part of such
measures, we have developed a methodology for assessing the
information protection level which can be used to establish the
quantitative object setting method required for the improvement of the
information protection level.
Abstract: In recent times, the problem of Unsolicited Bulk
Email (UBE) or commonly known as Spam Email, has increased at a
tremendous growth rate. We present an analysis of survey based on
classifications of UBE in various research works. There are many
research instances for classification between spam and non-spam
emails but very few research instances are available for classification
of spam emails, per se. This paper does not intend to assert some
UBE classification to be better than the others nor does it propose
any new classification but it bemoans the lack of harmony on number
and definition of categories proposed by different researchers. The
paper also elaborates on factors like intent of spammer, content of
UBE and ambiguity in different categories as proposed in related
research works of classifications of UBE.
Abstract: As the Internet continues to grow at a rapid pace as
the primary medium for communications and commerce and as
telecommunication networks and systems continue to expand their
global reach, digital information has become the most popular and
important information resource and our dependence upon the
underlying cyber infrastructure has been increasing significantly.
Unfortunately, as our dependency has grown, so has the threat to the
cyber infrastructure from spammers, attackers and criminal
enterprises. In this paper, we propose a new machine learning based
network intrusion detection framework for cyber security. The
detection process of the framework consists of two stages: model
construction and intrusion detection. In the model construction stage,
a semi-supervised machine learning algorithm is applied to a
collected set of network audit data to generate a profile of normal
network behavior and in the intrusion detection stage, input network
events are analyzed and compared with the patterns gathered in the
profile, and some of them are then flagged as anomalies should these
events are sufficiently far from the expected normal behavior. The
proposed framework is particularly applicable to the situations where
there is only a small amount of labeled network training data
available, which is very typical in real world network environments.