Abstract: In many outlier detection tasks, only training data
belonging to one class, i.e., the positive class, is available. The
task is then to predict a new data point as belonging either to
the positive class or to the negative class, in which case the
data point is considered an outlier. For this task, we propose a
novel corrupted Generative Adversarial Network (CorGAN). In the
adversarial process of training CorGAN, the Generator generates
outlier samples for the negative class, and the Discriminator is trained
to distinguish the positive training data from the generated negative
data. The proposed framework is evaluated using an image dataset
and a real-world network intrusion dataset. Our outlier-detection
method achieves state-of-the-art performance on both tasks.
Abstract: As the Internet continues to grow at a rapid pace as
the primary medium for communications and commerce and as
telecommunication networks and systems continue to expand their
global reach, digital information has become the most popular and
important information resource and our dependence upon the
underlying cyber infrastructure has been increasing significantly.
Unfortunately, as our dependency has grown, so has the threat to the
cyber infrastructure from spammers, attackers and criminal
enterprises. In this paper, we propose a new machine learning based
network intrusion detection framework for cyber security. The
detection process of the framework consists of two stages: model
construction and intrusion detection. In the model construction stage,
a semi-supervised machine learning algorithm is applied to a
collected set of network audit data to generate a profile of normal
network behavior and in the intrusion detection stage, input network
events are analyzed and compared with the patterns gathered in the
profile, and some of them are then flagged as anomalies should these
events are sufficiently far from the expected normal behavior. The
proposed framework is particularly applicable to the situations where
there is only a small amount of labeled network training data
available, which is very typical in real world network environments.