Abstract: Early detection of anomalies in data centers is important to reduce downtimes and the costs of periodic maintenance. However, there is little research on this topic and even fewer on the fusion of sensor data for the detection of abnormal events. The goal of this paper is to propose a method for anomaly detection in data centers by combining sensor data (temperature, humidity, power) and deep learning models. The model described in the paper uses one autoencoder per sensor to reconstruct the inputs. The auto-encoders contain Long-Short Term Memory (LSTM) layers and are trained using the normal samples of the relevant sensors selected by correlation analysis. The difference signal between the input and its reconstruction is then used to classify the samples using feature extraction and a random forest classifier. The data measured by the sensors of a data center between January 2019 and May 2020 are used to train the model, while the data between June 2020 and May 2021 are used to assess it. Performances of the model are assessed a posteriori through F1-score by comparing detected anomalies with the data center’s history. The proposed model outperforms the state-of-the-art reconstruction method, which uses only one autoencoder taking multivariate sequences and detects an anomaly with a threshold on the reconstruction error, with an F1-score of 83.60% compared to 24.16%.
Abstract: Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., entropy, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one-class classification (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, principal component analysis (PCA), kernel principal component analysis (KPCA), and autoassociative neural network (ANN) are presented and their performance are compared. It is also shown that, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 95%.
Abstract: Kernelized Correlation Filter (KCF) based trackers have gained a lot of attention recently because of their accuracy and fast calculation speed. However, this algorithm is not robust in cases where the object is lost by a sudden change of direction, being obscured or going out of view. In order to improve KCF performance in long-term tracking, this paper proposes an anomaly detection method for target loss warning by analyzing the response map of each frame, and a classification algorithm for reliable target re-locating mechanism by using Random fern. Being tested with Visual Tracker Benchmark and Visual Object Tracking datasets, the experimental results indicated that the precision and success rate of the proposed algorithm were 2.92 and 2.61 times higher than that of the original KCF algorithm, respectively. Moreover, the proposed tracker handles occlusion better than many state-of-the-art long-term tracking methods while running at 60 frames per second.
Abstract: The emergence of digital twin technology, a digital replica of physical world, has improved the real-time access to data from sensors about the performance of buildings. This digital transformation has opened up many opportunities to improve the management of the building by using the data collected to help monitor consumption patterns and energy leakages. One example is the integration of predictive models for anomaly detection. In this paper, we use the GAM (Generalised Additive Model) for the anomaly detection of Air Handling Units (AHU) power consumption pattern. There is ample research work on the use of GAM for the prediction of power consumption at the office building and nation-wide level. However, there is limited illustration of its anomaly detection capabilities, prescriptive analytics case study, and its integration with the latest development of digital twin technology. In this paper, we applied the general GAM modelling framework on the historical data of the AHU power consumption and cooling load of the building between Jan 2018 to Aug 2019 from an education campus in Singapore to train prediction models that, in turn, yield predicted values and ranges. The historical data are seamlessly extracted from the digital twin for modelling purposes. We enhanced the utility of the GAM model by using it to power a real-time anomaly detection system based on the forward predicted ranges. The magnitude of deviation from the upper and lower bounds of the uncertainty intervals is used to inform and identify anomalous data points, all based on historical data, without explicit intervention from domain experts. Notwithstanding, the domain expert fits in through an optional feedback loop through which iterative data cleansing is performed. After an anomalously high or low level of power consumption detected, a set of rule-based conditions are evaluated in real-time to help determine the next course of action for the facilities manager. The performance of GAM is then compared with other approaches to evaluate its effectiveness. Lastly, we discuss the successfully deployment of this approach for the detection of anomalous power consumption pattern and illustrated with real-world use cases.
Abstract: Data assets protection is a crucial issue in the
cybersecurity field. Companies use logical access control tools to
vault their information assets and protect them against external
threats, but they lack solutions to counter insider threats. Nowadays,
insider threats are the most significant concern of security analysts.
They are mainly individuals with legitimate access to companies
information systems, which use their rights with malicious intents.
In several fields, behavior anomaly detection is the method used by
cyber specialists to counter the threats of user malicious activities
effectively. In this paper, we present the step toward the construction
of a user and entity behavior analysis framework by proposing a
behavior anomaly detection model. This model combines machine
learning classification techniques and graph-based methods, relying
on linear algebra and parallel computing techniques. We show the
utility of an ensemble learning approach in this context. We present
some detection methods tests results on an representative access
control dataset. The use of some explored classifiers gives results
up to 99% of accuracy.
Abstract: In order to reduce the number of deaths due to heart
problems, we propose the use of Hierarchical Temporal Memory
Algorithm (HTM) which is a real time anomaly detection algorithm.
HTM is a cortical learning algorithm based on neocortex used for
anomaly detection. In other words, it is based on a conceptual theory
of how the human brain can work. It is powerful in predicting unusual
patterns, anomaly detection and classification. In this paper, HTM
have been implemented and tested on ECG datasets in order to detect
cardiac anomalies. Experiments showed good performance in terms
of specificity, sensitivity and execution time.
Abstract: Intrusion detection systems (IDS) are the main components of network security. These systems analyze the network events for intrusion detection. The design of an IDS is through the training of normal traffic data or attack. The methods of machine learning are the best ways to design IDSs. In the method presented in this article, the pruning algorithm of C5.0 decision tree is being used to reduce the features of traffic data used and training IDS by the least square vector algorithm (LS-SVM). Then, the remaining features are arranged according to the predictor importance criterion. The least important features are eliminated in the order. The remaining features of this stage, which have created the highest level of accuracy in LS-SVM, are selected as the final features. The features obtained, compared to other similar articles which have examined the selected features in the least squared support vector machine model, are better in the accuracy, true positive rate, and false positive. The results are tested by the UNSW-NB15 dataset.
Abstract: To assist individual departments within universities in their energy management tasks, this study explores the application of Building Information Modeling in establishing the ‘BIM based Energy Management Support System’ (BIM-EMSS). The BIM-EMSS consists of six components: (1) sensors installed for each occupant and each equipment, (2) electricity sub-meters (constantly logging lighting, HVAC, and socket electricity consumptions of each room), (3) BIM models of all rooms within individual departments’ facilities, (4) data warehouse (for storing occupancy status and logged electricity consumption data), (5) building energy management system that provides energy managers with various energy management functions, and (6) energy simulation tool (such as eQuest) that generates real time 'standard energy consumptions' data against which 'actual energy consumptions' data are compared and energy efficiency evaluated. Through the building energy management system, the energy manager is able to (a) have 3D visualization (BIM model) of each room, in which the occupancy and equipment status detected by the sensors and the electricity consumptions data logged are displayed constantly; (b) perform real time energy consumption analysis to compare the actual and standard energy consumption profiles of a space; (c) obtain energy consumption anomaly detection warnings on certain rooms so that energy management corrective actions can be further taken (data mining technique is employed to analyze the relation between space occupancy pattern with current space equipment setting to indicate an anomaly, such as when appliances turn on without occupancy); and (d) perform historical energy consumption analysis to review monthly and annually energy consumption profiles and compare them against historical energy profiles. The BIM-EMSS was further implemented in a research lab in the Department of Architecture of NTUST in Taiwan and implementation results presented to illustrate how it can be used to assist individual departments within universities in their energy management tasks.
Abstract: In recent years, a wide variety of applications are developed with Support Vector Machines -SVM- methods and Artificial Neural Networks -ANN-. In general, these methods depend on intrusion knowledge databases such as KDD99, ISCX, and CAIDA among others. New classes of detectors are generated by machine learning techniques, trained and tested over network databases. Thereafter, detectors are employed to detect anomalies in network communication scenarios according to user’s connections behavior. The first detector based on training dataset is deployed in different real-world networks with mobile and non-mobile devices to analyze the performance and accuracy over static detection. The vulnerabilities are based on previous work in telemedicine apps that were developed on the research group. This paper presents the differences on detections results between some network scenarios by applying traditional detectors deployed with artificial neural networks and support vector machines.
Abstract: The critical concern of satellite operations is to ensure
the health and safety of satellites. The worst case in this perspective
is probably the loss of a mission, but the more common interruption
of satellite functionality can result in compromised mission
objectives. All the data acquiring from the spacecraft are known as
Telemetry (TM), which contains the wealth information related to the
health of all its subsystems. Each single item of information is
contained in a telemetry parameter, which represents a time-variant
property (i.e. a status or a measurement) to be checked. As a
consequence, there is a continuous improvement of TM monitoring
systems to reduce the time required to respond to changes in a
satellite's state of health. A fast conception of the current state of the
satellite is thus very important to respond to occurring failures.
Statistical multivariate latent techniques are one of the vital learning
tools that are used to tackle the problem above coherently.
Information extraction from such rich data sources using advanced
statistical methodologies is a challenging task due to the massive
volume of data. To solve this problem, in this paper, we present a
proposed unsupervised learning algorithm based on Principle
Component Analysis (PCA) technique. The algorithm is particularly
applied on an actual remote sensing spacecraft. Data from the
Attitude Determination and Control System (ADCS) was acquired
under two operation conditions: normal and faulty states. The models
were built and tested under these conditions, and the results show that
the algorithm could successfully differentiate between these
operations conditions. Furthermore, the algorithm provides
competent information in prediction as well as adding more insight
and physical interpretation to the ADCS operation.
Abstract: One of the tasks of optical surveillance is to detect
anomalies in large amounts of image data. However, if the size of the
anomaly is very small, limited information is available to distinguish
it from the surrounding environment. Spectral detection provides a
useful source of additional information and may help to detect
anomalies with a size of a few pixels or less. Unfortunately, spectral
cameras are expensive because of the difficulty of separating two
spatial in addition to one spectral dimension. We investigate the
possibility of modifying a simple spectral line detector for outdoor
detection. This may be especially useful if the area of interest forms a
line, such as the horizon. We use a monochrome CCD that also
enables detection into the near infrared. A simple camera is attached
to the setup to determine which part of the environment is spectrally
imaged. Our preliminary results indicate that sensitive detection of
very small targets is indeed possible. Spectra could be taken from the
various targets by averaging columns in the line image. By imaging a
set of lines of various widths we found narrow lines that could not be
seen in the color image but remained visible in the spectral line
image. A simultaneous analysis of the entire spectra can produce
better results than visual inspection of the line spectral image. We are
presently developing calibration targets for spatial and spectral
focusing and alignment with the spatial camera. This will present
improved results and more use in outdoor application.
Abstract: Anomaly detection techniques have been focused on two main components: data extraction and selection and the second one is the analysis performed over the obtained data. The goal of this paper is to analyze the influence that each of these components has over the system performance by evaluating detection over network scenarios with different setups. The independent variables are as follows: the number of system inputs, the way the inputs are codified and the complexity of the analysis techniques. For the analysis, some approaches of artificial neural networks are implemented with different number of layers. The obtained results show the influence that each of these variables has in the system performance.
Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.
Abstract: Nowaday-s, many organizations use systems that
support business process as a whole or partially. However, in some
application domains, like software development and health care
processes, a normative Process Aware System (PAS) is not suitable,
because a flexible support is needed to respond rapidly to new
process models. On the other hand, a flexible Process Aware System
may be vulnerable to undesirable and fraudulent executions, which
imposes a tradeoff between flexibility and security. In order to make
this tradeoff available, a genetic-based anomaly detection model for
logs of Process Aware Systems is presented in this paper. The
detection of an anomalous trace is based on discovering an
appropriate process model by using genetic process mining and
detecting traces that do not fit the appropriate model as anomalous
trace; therefore, when used in PAS, this model is an automated
solution that can support coexistence of flexibility and security.
Abstract: In this report we present a rule-based approach to
detect anomalous telephone calls. The method described here uses
subscriber usage CDR (call detail record) data sampled over two
observation periods: study period and test period. The study period
contains call records of customers- non-anomalous behaviour.
Customers are first grouped according to their similar usage
behaviour (like, average number of local calls per week, etc). For
customers in each group, we develop a probabilistic model to describe
their usage. Next, we use maximum likelihood estimation (MLE) to
estimate the parameters of the calling behaviour. Then we determine
thresholds by calculating acceptable change within a group. MLE is
used on the data in the test period to estimate the parameters of the
calling behaviour. These parameters are compared against thresholds.
Any deviation beyond the threshold is used to raise an alarm. This
method has the advantage of identifying local anomalies as compared
to techniques which identify global anomalies. The method is tested
for 90 days of study data and 10 days of test data of telecom
customers. For medium to large deviations in the data in test window,
the method is able to identify 90% of anomalous usage with less than
1% false alarm rate.
Abstract: This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.
Abstract: As the network based technologies become
omnipresent, demands to secure networks/systems against threat
increase. One of the effective ways to achieve higher security is
through the use of intrusion detection systems (IDS), which are a
software tool to detect anomalous in the computer or network. In this
paper, an IDS has been developed using an improved machine
learning based algorithm, Locally Linear Neuro Fuzzy Model
(LLNF) for classification whereas this model is originally used for
system identification. A key technical challenge in IDS and LLNF
learning is the curse of high dimensionality. Therefore a feature
selection phase is proposed which is applicable to any IDS. While
investigating the use of three feature selection algorithms, in this
model, it is shown that adding feature selection phase reduces
computational complexity of our model. Feature selection algorithms
require the use of a feature goodness measure. The use of both a
linear and a non-linear measure - linear correlation coefficient and
mutual information- is investigated respectively
Abstract: A prototype of an anomaly detection system was
developed to automate process of recognizing an anomaly of
roentgen image by utilizing fuzzy histogram hyperbolization image
enhancement and back propagation artificial neural network.
The system consists of image acquisition, pre-processor, feature
extractor, response selector and output. Fuzzy Histogram
Hyperbolization is chosen to improve the quality of the roentgen
image. The fuzzy histogram hyperbolization steps consist of
fuzzyfication, modification of values of membership functions and
defuzzyfication. Image features are extracted after the the quality of
the image is improved. The extracted image features are input to the
artificial neural network for detecting anomaly. The number of nodes
in the proposed ANN layers was made small.
Experimental results indicate that the fuzzy histogram
hyperbolization method can be used to improve the quality of the
image. The system is capable to detect the anomaly in the roentgen
image.
Abstract: With increasing complexity in electronic systems
there is a need for system level anomaly detection and fault isolation.
Anomaly detection based on vector similarity to a training set is used
in this paper through two approaches, one the preserves the original
information, Mahalanobis Distance (MD), and the other that
compresses the data into its principal components, Projection Pursuit
Analysis. These methods have been used to detect deviations in
system performance from normal operation and for critical parameter
isolation in multivariate environments. The study evaluates the
detection capability of each approach on a set of test data with known
faults against a baseline set of data representative of such “healthy"
systems.
Abstract: The one-class support vector machine “support vector
data description” (SVDD) is an ideal approach for anomaly or outlier
detection. However, for the applicability of SVDD in real-world
applications, the ease of use is crucial. The results of SVDD are
massively determined by the choice of the regularisation parameter C
and the kernel parameter of the widely used RBF kernel. While for
two-class SVMs the parameters can be tuned using cross-validation
based on the confusion matrix, for a one-class SVM this is not
possible, because only true positives and false negatives can occur
during training. This paper proposes an approach to find the optimal
set of parameters for SVDD solely based on a training set from
one class and without any user parameterisation. Results on artificial
and real data sets are presented, underpinning the usefulness of the
approach.