An Intelligent Combined Method Based on Power Spectral Density, Decision Trees and Fuzzy Logic for Hydraulic Pumps Fault Diagnosis

Recently, the issue of machine condition monitoring and fault diagnosis as a part of maintenance system became global due to the potential advantages to be gained from reduced maintenance costs, improved productivity and increased machine availability. The aim of this work is to investigate the effectiveness of a new fault diagnosis method based on power spectral density (PSD) of vibration signals in combination with decision trees and fuzzy inference system (FIS). To this end, a series of studies was conducted on an external gear hydraulic pump. After a test under normal condition, a number of different machine defect conditions were introduced for three working levels of pump speed (1000, 1500, and 2000 rpm), corresponding to (i) Journal-bearing with inner face wear (BIFW), (ii) Gear with tooth face wear (GTFW), and (iii) Journal-bearing with inner face wear plus Gear with tooth face wear (B&GW). The features of PSD values of vibration signal were extracted using descriptive statistical parameters. J48 algorithm is used as a feature selection procedure to select pertinent features from data set. The output of J48 algorithm was employed to produce the crisp if-then rule and membership function sets. The structure of FIS classifier was then defined based on the crisp sets. In order to evaluate the proposed PSD-J48-FIS model, the data sets obtained from vibration signals of the pump were used. Results showed that the total classification accuracy for 1000, 1500, and 2000 rpm conditions were 96.42%, 100%, and 96.42% respectively. The results indicate that the combined PSD-J48-FIS model has the potential for fault diagnosis of hydraulic pumps.

Artificial Neural Network Model for a Low Cost Failure Sensor: Performance Assessment in Pipeline Distribution

This paper describes an automated event detection and location system for water distribution pipelines which is based upon low-cost sensor technology and signature analysis by an Artificial Neural Network (ANN). The development of a low cost failure sensor which measures the opacity or cloudiness of the local water flow has been designed, developed and validated, and an ANN based system is then described which uses time series data produced by sensors to construct an empirical model for time series prediction and classification of events. These two components have been installed, tested and verified in an experimental site in a UK water distribution system. Verification of the system has been achieved from a series of simulated burst trials which have provided real data sets. It is concluded that the system has potential in water distribution network management.

Random Projections for Dimensionality Reduction in ICA

In this paper we present a technique to speed up ICA based on the idea of reducing the dimensionality of the data set preserving the quality of the results. In particular we refer to FastICA algorithm which uses the Kurtosis as statistical property to be maximized. By performing a particular Johnson-Lindenstrauss like projection of the data set, we find the minimum dimensionality reduction rate ¤ü, defined as the ratio between the size k of the reduced space and the original one d, which guarantees a narrow confidence interval of such estimator with high confidence level. The derived dimensionality reduction rate depends on a system control parameter β easily computed a priori on the basis of the observations only. Extensive simulations have been done on different sets of real world signals. They show that actually the dimensionality reduction is very high, it preserves the quality of the decomposition and impressively speeds up FastICA. On the other hand, a set of signals, on which the estimated reduction rate is greater than 1, exhibits bad decomposition results if reduced, thus validating the reliability of the parameter β. We are confident that our method will lead to a better approach to real time applications.

Methodology Issues and Design Approach of VLE on Mathematical Concepts Acquisition within Secondary Education in England

This study used positivist quantitative approach to examine the mathematical concepts acquisition of- KS4 (14-16) Special Education Needs (SENs) students within the school sector education in England. The research is based on a pilot study and the design is completely holistic in its approach with mixing methodologies. The study combines the qualitative and quantitative methods of approach in gathering formative data for the design process. Although, the approach could best be described as a mix method, fundamentally with a strong positivist paradigm, hence my earlier understanding of the differentiation of the students, student – teacher body and the various elements of indicators that is being measured which will require an attenuated description of individual research subjects. The design process involves four phases with five key stages which are; literature review and document analysis, the survey, interview, and observation; then finally the analysis of data set. The research identified the need for triangulation with Reid-s phases of data management providing scaffold for the study. The study clearly identified the ideological and philosophical aspects of educational research design for the study of mathematics by the special education needs (SENs) students in England using the virtual learning environment (VLE) platform.

An Evaluation Model for Semantic Enablement of Virtual Research Environments

The Tropical Data Hub (TDH) is a virtual research environment that provides researchers with an e-research infrastructure to congregate significant tropical data sets for data reuse, integration, searching, and correlation. However, researchers often require data and metadata synthesis across disciplines for crossdomain analyses and knowledge discovery. A triplestore offers a semantic layer to achieve a more intelligent method of search to support the synthesis requirements by automating latent linkages in the data and metadata. Presently, the benchmarks to aid the decision of which triplestore is best suited for use in an application environment like the TDH are limited to performance. This paper describes a new evaluation tool developed to analyze both features and performance. The tool comprises a weighted decision matrix to evaluate the interoperability, functionality, performance, and support availability of a range of integrated and native triplestores to rank them according to requirements of the TDH.

Orthogonal Polynomial Density Estimates: Alternative Representation and Degree Selection

The density estimates considered in this paper comprise a base density and an adjustment component consisting of a linear combination of orthogonal polynomials. It is shown that, in the context of density approximation, the coefficients of the linear combination can be determined either from a moment-matching technique or a weighted least-squares approach. A kernel representation of the corresponding density estimates is obtained. Additionally, two refinements of the Kronmal-Tarter stopping criterion are proposed for determining the degree of the polynomial adjustment. By way of illustration, the density estimation methodology advocated herein is applied to two data sets.

An ensemble of Weighted Support Vector Machines for Ordinal Regression

Instead of traditional (nominal) classification we investigate the subject of ordinal classification or ranking. An enhanced method based on an ensemble of Support Vector Machines (SVM-s) is proposed. Each binary classifier is trained with specific weights for each object in the training data set. Experiments on benchmark datasets and synthetic data indicate that the performance of our approach is comparable to state of the art kernel methods for ordinal regression. The ensemble method, which is straightforward to implement, provides a very good sensitivity-specificity trade-off for the highest and lowest rank.

Data Mining Classification Methods Applied in Drug Design

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

A New Evolutionary Algorithm for Cluster Analysis

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Modelling the Occurrence of Defects and Change Requests during User Acceptance Testing

Software developed for a specific customer under contract typically undergoes a period of testing by the customer before acceptance. This is known as user acceptance testing and the process can reveal both defects in the system and requests for changes to the product. This paper uses nonhomogeneous Poisson processes to model a real user acceptance data set from a recently developed system. In particular a split Poisson process is shown to provide an excellent fit to the data. The paper explains how this model can be used to aid the allocation of resources through the accurate prediction of occurrences both during the acceptance testing phase and before this activity begins.

Improving the Performance of Proxy Server by Using Data Mining Technique

Currently, web usage make a huge data from a lot of user attention. In general, proxy server is a system to support web usage from user and can manage system by using hit rates. This research tries to improve hit rates in proxy system by applying data mining technique. The data set are collected from proxy servers in the university and are investigated relationship based on several features. The model is used to predict the future access websites. Association rule technique is applied to get the relation among Date, Time, Main Group web, Sub Group web, and Domain name for created model. The results showed that this technique can predict web content for the next day, moreover the future accesses of websites increased from 38.15% to 85.57 %. This model can predict web page access which tends to increase the efficient of proxy servers as a result. In additional, the performance of internet access will be improved and help to reduce traffic in networks.

Dynamic Models versus Frailty Models for Recurrent Event Data

Recurrent event data is a special type of multivariate survival data. Dynamic and frailty models are one of the approaches that dealt with this kind of data. A comparison between these two models is studied using the empirical standard deviation of the standardized martingale residual processes as a way of assessing the fit of the two models based on the Aalen additive regression model. Here we found both approaches took heterogeneity into account and produce residual standard deviations close to each other both in the simulation study and in the real data set.

Ensemble Learning with Decision Tree for Remote Sensing Classification

In recent years, a number of works proposing the combination of multiple classifiers to produce a single classification have been reported in remote sensing literature. The resulting classifier, referred to as an ensemble classifier, is generally found to be more accurate than any of the individual classifiers making up the ensemble. As accuracy is the primary concern, much of the research in the field of land cover classification is focused on improving classification accuracy. This study compares the performance of four ensemble approaches (boosting, bagging, DECORATE and random subspace) with a univariate decision tree as base classifier. Two training datasets, one without ant noise and other with 20 percent noise was used to judge the performance of different ensemble approaches. Results with noise free data set suggest an improvement of about 4% in classification accuracy with all ensemble approaches in comparison to the results provided by univariate decision tree classifier. Highest classification accuracy of 87.43% was achieved by boosted decision tree. A comparison of results with noisy data set suggests that bagging, DECORATE and random subspace approaches works well with this data whereas the performance of boosted decision tree degrades and a classification accuracy of 79.7% is achieved which is even lower than that is achieved (i.e. 80.02%) by using unboosted decision tree classifier.

Ensembling Adaptively Constructed Polynomial Regression Models

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Gene Selection Guided by Feature Interdependence

Cancers could normally be marked by a number of differentially expressed genes which show enormous potential as biomarkers for a certain disease. Recent years, cancer classification based on the investigation of gene expression profiles derived by high-throughput microarrays has widely been used. The selection of discriminative genes is, therefore, an essential preprocess step in carcinogenesis studies. In this paper, we have proposed a novel gene selector using information-theoretic measures for biological discovery. This multivariate filter is a four-stage framework through the analyses of feature relevance, feature interdependence, feature redundancy-dependence and subset rankings, and having been examined on the colon cancer data set. Our experimental result show that the proposed method outperformed other information theorem based filters in all aspect of classification errors and classification performance.

Automated Knowledge Engineering

This article outlines conceptualization and implementation of an intelligent system capable of extracting knowledge from databases. Use of hybridized features of both the Rough and Fuzzy Set theory render the developed system flexibility in dealing with discreet as well as continuous datasets. A raw data set provided to the system, is initially transformed in a computer legible format followed by pruning of the data set. The refined data set is then processed through various Rough Set operators which enable discovery of parameter relationships and interdependencies. The discovered knowledge is automatically transformed into a rule base expressed in Fuzzy terms. Two exemplary cancer repository datasets (for Breast and Lung Cancer) have been used to test and implement the proposed framework.

A CUSUM Control Chart to Monitor Wafer Quality

C-control chart assumes that process nonconformities follow a Poisson distribution. In actuality, however, this Poisson distribution does not always occur. A process control for semiconductor based on a Poisson distribution always underestimates the true average amount of nonconformities and the process variance. Quality is described more accurately if a compound Poisson process is used for process control at this time. A cumulative sum (CUSUM) control chart is much better than a C control chart when a small shift will be detected. This study calculates one-sided CUSUM ARLs using a Markov chain approach to construct a CUSUM control chart with an underlying Poisson-Gamma compound distribution for the failure mechanism. Moreover, an actual data set from a wafer plant is used to demonstrate the operation of the proposed model. The results show that a CUSUM control chart realizes significantly better performance than EWMA.

A Growing Natural Gas Approach for Evaluating Quality of Software Modules

The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.

Issues in Spectral Source Separation Techniques for Plant-wide Oscillation Detection and Diagnosis

In the last few years, three multivariate spectral analysis techniques namely, Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Non-negative Matrix Factorization (NMF) have emerged as effective tools for oscillation detection and isolation. While the first method is used in determining the number of oscillatory sources, the latter two methods are used to identify source signatures by formulating the detection problem as a source identification problem in the spectral domain. In this paper, we present a critical drawback of the underlying linear (mixing) model which strongly limits the ability of the associated source separation methods to determine the number of sources and/or identify the physical source signatures. It is shown that the assumed mixing model is only valid if each unit of the process gives equal weighting (all-pass filter) to all oscillatory components in its inputs. This is in contrast to the fact that each unit, in general, acts as a filter with non-uniform frequency response. Thus, the model can only facilitate correct identification of a source with a single frequency component, which is again unrealistic. To overcome this deficiency, an iterative post-processing algorithm that correctly identifies the physical source(s) is developed. An additional issue with the existing methods is that they lack a procedure to pre-screen non-oscillatory/noisy measurements which obscure the identification of oscillatory sources. In this regard, a pre-screening procedure is prescribed based on the notion of sparseness index to eliminate the noisy and non-oscillatory measurements from the data set used for analysis.