Space Telemetry Anomaly Detection Based on Statistical PCA Algorithm

The critical concern of satellite operations is to ensure the health and safety of satellites. The worst case in this perspective is probably the loss of a mission, but the more common interruption of satellite functionality can result in compromised mission objectives. All the data acquiring from the spacecraft are known as Telemetry (TM), which contains the wealth information related to the health of all its subsystems. Each single item of information is contained in a telemetry parameter, which represents a time-variant property (i.e. a status or a measurement) to be checked. As a consequence, there is a continuous improvement of TM monitoring systems to reduce the time required to respond to changes in a satellite's state of health. A fast conception of the current state of the satellite is thus very important to respond to occurring failures. Statistical multivariate latent techniques are one of the vital learning tools that are used to tackle the problem above coherently. Information extraction from such rich data sources using advanced statistical methodologies is a challenging task due to the massive volume of data. To solve this problem, in this paper, we present a proposed unsupervised learning algorithm based on Principle Component Analysis (PCA) technique. The algorithm is particularly applied on an actual remote sensing spacecraft. Data from the Attitude Determination and Control System (ADCS) was acquired under two operation conditions: normal and faulty states. The models were built and tested under these conditions, and the results show that the algorithm could successfully differentiate between these operations conditions. Furthermore, the algorithm provides competent information in prediction as well as adding more insight and physical interpretation to the ADCS operation.

Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks

DNA Barcode provides good sources of needed information to classify living species. The classification problem has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use the similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. However, all the used methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. In fact, our method permits to avoid the complex problem of form and structure in different classes of organisms. The empirical data and their classification performances are compared with other methods. Evenly, in this study, we present our system which is consisted of three phases. The first one, is called transformation, is composed of three sub steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. Moreover, the second phase step is an approximation; it is empowered by the use of Multi Library Wavelet Neural Networks (MLWNN). Finally, the third one, is called the classification of DNA Barcodes, is realized by applying the algorithm of hierarchical classification.

Customer Churn Prediction: A Cognitive Approach

Customer churn prediction is one of the most useful areas of study in customer analytics. Due to the enormous amount of data available for such predictions, machine learning and data mining have been heavily used in this domain. There exist many machine learning algorithms directly applicable for the problem of customer churn prediction, and here, we attempt to experiment on a novel approach by using a cognitive learning based technique in an attempt to improve the results obtained by using a combination of supervised learning methods, with cognitive unsupervised learning methods.

Unsupervised Segmentation Technique for Acute Leukemia Cells Using Clustering Algorithms

Leukaemia is a blood cancer disease that contributes to the increment of mortality rate in Malaysia each year. There are two main categories for leukaemia, which are acute and chronic leukaemia. The production and development of acute leukaemia cells occurs rapidly and uncontrollable. Therefore, if the identification of acute leukaemia cells could be done fast and effectively, proper treatment and medicine could be delivered. Due to the requirement of prompt and accurate diagnosis of leukaemia, the current study has proposed unsupervised pixel segmentation based on clustering algorithm in order to obtain a fully segmented abnormal white blood cell (blast) in acute leukaemia image. In order to obtain the segmented blast, the current study proposed three clustering algorithms which are k-means, fuzzy c-means and moving k-means algorithms have been applied on the saturation component image. Then, median filter and seeded region growing area extraction algorithms have been applied, to smooth the region of segmented blast and to remove the large unwanted regions from the image, respectively. Comparisons among the three clustering algorithms are made in order to measure the performance of each clustering algorithm on segmenting the blast area. Based on the good sensitivity value that has been obtained, the results indicate that moving kmeans clustering algorithm has successfully produced the fully segmented blast region in acute leukaemia image. Hence, indicating that the resultant images could be helpful to haematologists for further analysis of acute leukaemia.

A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model

This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in VerilogHDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.

Analysis of Diverse Clustering Tools in Data Mining

Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.

A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Automatic Moment-Based Texture Segmentation

An automatic moment-based texture segmentation approach is proposed in this paper. First, we describe the related work in this computer vision domain. Our texture feature extraction, the first part of the texture recognition process, produces a set of moment-based feature vectors. For each image pixel, a texture feature vector is computed as a sequence of area moments. Then, an automatic pixel classification approach is proposed. The feature vectors are clustered using an unsupervised classification algorithm, the optimal number of clusters being determined using a measure based on validation indexes. From the resulted pixel classes one determines easily the desired texture regions of the image.

An Adaptive Hand-Talking System for the Hearing Impaired

An adaptive Chinese hand-talking system is presented in this paper. By analyzing the 3 data collecting strategies for new users, the adaptation framework including supervised and unsupervised adaptation methods is proposed. For supervised adaptation, affinity propagation (AP) is used to extract exemplar subsets, and enhanced maximum a posteriori / vector field smoothing (eMAP/VFS) is proposed to pool the adaptation data among different models. For unsupervised adaptation, polynomial segment models (PSMs) are used to help hidden Markov models (HMMs) to accurately label the unlabeled data, then the "labeled" data together with signerindependent models are inputted to MAP algorithm to generate signer-adapted models. Experimental results show that the proposed framework can execute both supervised adaptation with small amount of labeled data and unsupervised adaptation with large amount of unlabeled data to tailor the original models, and both achieve improvements on the performance of recognition rate.

Feature Based Unsupervised Intrusion Detection

The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.

An Optimal Unsupervised Satellite image Segmentation Approach Based on Pearson System and k-Means Clustering Algorithm Initialization

This paper presents an optimal and unsupervised satellite image segmentation approach based on Pearson system and k-Means Clustering Algorithm Initialization. Such method could be considered as original by the fact that it utilised K-Means clustering algorithm for an optimal initialisation of image class number on one hand and it exploited Pearson system for an optimal statistical distributions- affectation of each considered class on the other hand. Satellite image exploitation requires the use of different approaches, especially those founded on the unsupervised statistical segmentation principle. Such approaches necessitate definition of several parameters like image class number, class variables- estimation and generalised mixture distributions. Use of statistical images- attributes assured convincing and promoting results under the condition of having an optimal initialisation step with appropriated statistical distributions- affectation. Pearson system associated with a k-means clustering algorithm and Stochastic Expectation-Maximization 'SEM' algorithm could be adapted to such problem. For each image-s class, Pearson system attributes one distribution type according to different parameters and especially the Skewness 'β1' and the kurtosis 'β2'. The different adapted algorithms, K-Means clustering algorithm, SEM algorithm and Pearson system algorithm, are then applied to satellite image segmentation problem. Efficiency of those combined algorithms was firstly validated with the Mean Quadratic Error 'MQE' evaluation, and secondly with visual inspection along several comparisons of these unsupervised images- segmentation.

Urban Land Cover Change of Olomouc City Using LANDSAT Images

This paper regards the phenomena of intensive suburbanization and urbanization in Olomouc city and in Olomouc region in general for the period of 1986–2009. A Remote Sensing approach that involves tracking of changes in Land Cover units is proposed to quantify the urbanization state and trends in temporal and spatial aspects. It actually consisted of two approaches, Experiment 1 and Experiment 2 which implied two different image classification solutions in order to provide Land Cover maps for each 1986–2009 time split available in the Landsat image set. Experiment 1 dealt with the unsupervised classification, while Experiment 2 involved semi- supervised classification, using a combination of object-based and pixel-based classifiers. The resulting Land Cover maps were subsequently quantified for the proportion of urban area unit and its trend through time, and also for the urban area unit stability, yielding the relation of spatial and temporal development of the urban area unit. Some outcomes seem promising but there is indisputably room for improvements of source data and also processing and filtering.

Fusion of Colour and Depth Information to Enhance Wound Tissue Classification

Patients with diabetes are susceptible to chronic foot wounds which may be difficult to manage and slow to heal. Diagnosis and treatment currently rely on the subjective judgement of experienced professionals. An objective method of tissue assessment is required. In this paper, a data fusion approach was taken to wound tissue classification. The supervised Maximum Likelihood and unsupervised Multi-Modal Expectation Maximisation algorithms were used to classify tissues within simulated wound models by weighting the contributions of both colour and 3D depth information. It was found that, at low weightings, depth information could show significant improvements in classification accuracy when compared to classification by colour alone, particularly when using the maximum likelihood method. However, larger weightings were found to have an entirely negative effect on accuracy.

Variance Based Component Analysis for Texture Segmentation

This paper presents a comparative analysis of a new unsupervised PCA-based technique for steel plates texture segmentation towards defect detection. The proposed scheme called Variance Based Component Analysis or VBCA employs PCA for feature extraction, applies a feature reduction algorithm based on variance of eigenpictures and classifies the pixels as defective and normal. While the classic PCA uses a clusterer like Kmeans for pixel clustering, VBCA employs thresholding and some post processing operations to label pixels as defective and normal. The experimental results show that proposed algorithm called VBCA is 12.46% more accurate and 78.85% faster than the classic PCA.

Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.

Evolutionary Computing Approach for the Solution of Initial value Problems in Ordinary Differential Equations

An evolutionary computing technique for solving initial value problems in Ordinary Differential Equations is proposed in this paper. Neural network is used as a universal approximator while the adaptive parameters of neural networks are optimized by genetic algorithm. The solution is achieved on the continuous grid of time instead of discrete as in other numerical techniques. The comparison is carried out with classical numerical techniques and the solution is found with a uniform accuracy of MSE ≈ 10-9 .

Topographic Arrangement of 3D Design Components on 2D Maps by Unsupervised Feature Extraction

As a result of the daily workflow in the design development departments of companies, databases containing huge numbers of 3D geometric models are generated. According to the given problem engineers create CAD drawings based on their design ideas and evaluate the performance of the resulting design, e.g. by computational simulations. Usually, new geometries are built either by utilizing and modifying sets of existing components or by adding single newly designed parts to a more complex design. The present paper addresses the two facets of acquiring components from large design databases automatically and providing a reasonable overview of the parts to the engineer. A unified framework based on the topographic non-negative matrix factorization (TNMF) is proposed which solves both aspects simultaneously. First, on a given database meaningful components are extracted into a parts-based representation in an unsupervised manner. Second, the extracted components are organized and visualized on square-lattice 2D maps. It is shown on the example of turbine-like geometries that these maps efficiently provide a wellstructured overview on the database content and, at the same time, define a measure for spatial similarity allowing an easy access and reuse of components in the process of design development.

Distributional Semantics Approach to Thai Word Sense Disambiguation

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely  /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.

Comparison of Different Neural Network Approaches for the Prediction of Kidney Dysfunction

This paper presents the prediction of kidney dysfunction using different neural network (NN) approaches. Self organization Maps (SOM), Probabilistic Neural Network (PNN) and Multi Layer Perceptron Neural Network (MLPNN) trained with Back Propagation Algorithm (BPA) are used in this study. Six hundred and sixty three sets of analytical laboratory tests have been collected from one of the private clinical laboratories in Baghdad. For each subject, Serum urea and Serum creatinin levels have been analyzed and tested by using clinical laboratory measurements. The collected urea and cretinine levels are then used as inputs to the three NN models in which the training process is done by different neural approaches. SOM which is a class of unsupervised network whereas PNN and BPNN are considered as class of supervised networks. These networks are used as a classifier to predict whether kidney is normal or it will have a dysfunction. The accuracy of prediction, sensitivity and specificity were found for each type of the proposed networks .We conclude that PNN gives faster and more accurate prediction of kidney dysfunction and it works as promising tool for predicting of routine kidney dysfunction from the clinical laboratory data.