Selecting Negative Examples for Protein-Protein Interaction

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Power System Security Assessment using Binary SVM Based Pattern Recognition

Power System Security is a major concern in real time operation. Conventional method of security evaluation consists of performing continuous load flow and transient stability studies by simulation program. This is highly time consuming and infeasible for on-line application. Pattern Recognition (PR) is a promising tool for on-line security evaluation. This paper proposes a Support Vector Machine (SVM) based binary classification for static and transient security evaluation. The proposed SVM based PR approach is implemented on New England 39 Bus and IEEE 57 Bus systems. The simulation results of SVM classifier is compared with the other classifier algorithms like Method of Least Squares (MLS), Multi- Layer Perceptron (MLP) and Linear Discriminant Analysis (LDA) classifiers.

Gas Detection via Machine Learning

We present an Electronic Nose (ENose), which is aimed at identifying the presence of one out of two gases, possibly detecting the presence of a mixture of the two. Estimation of the concentrations of the components is also performed for a volatile organic compound (VOC) constituted by methanol and acetone, for the ranges 40-400 and 22-220 ppm (parts-per-million), respectively. Our system contains 8 sensors, 5 of them being gas sensors (of the class TGS from FIGARO USA, INC., whose sensing element is a tin dioxide (SnO2) semiconductor), the remaining being a temperature sensor (LM35 from National Semiconductor Corporation), a humidity sensor (HIH–3610 from Honeywell), and a pressure sensor (XFAM from Fujikura Ltd.). Our integrated hardware–software system uses some machine learning principles and least square regression principle to identify at first a new gas sample, or a mixture, and then to estimate the concentrations. In particular we adopt a training model using the Support Vector Machine (SVM) approach with linear kernel to teach the system how discriminate among different gases. Then we apply another training model using the least square regression, to predict the concentrations. The experimental results demonstrate that the proposed multiclassification and regression scheme is effective in the identification of the tested VOCs of methanol and acetone with 96.61% correctness. The concentration prediction is obtained with 0.979 and 0.964 correlation coefficient for the predicted versus real concentrations of methanol and acetone, respectively.

Fuzzy Rules Generation and Extraction from Support Vector Machine Based on Kernel Function Firing Signals

Our study proposes an alternative method in building Fuzzy Rule-Based System (FRB) from Support Vector Machine (SVM). The first set of fuzzy IF-THEN rules is obtained through an equivalence of the SVM decision network and the zero-ordered Sugeno FRB type of the Adaptive Network Fuzzy Inference System (ANFIS). The second set of rules is generated by combining the first set based on strength of firing signals of support vectors using Gaussian kernel. The final set of rules is then obtained from the second set through input scatter partitioning. A distinctive advantage of our method is the guarantee that the number of final fuzzy IFTHEN rules is not more than the number of support vectors in the trained SVM. The final FRB system obtained is capable of performing classification with results comparable to its SVM counterpart, but it has an advantage over the black-boxed SVM in that it may reveal human comprehensible patterns.

Improving Protein-Protein Interaction Prediction by Using Encoding Strategies and Random Indices

A New features are extracted and compared to improve the prediction of protein-protein interactions. The basic idea is to select and use the best set of features from the Tensor matrices that are produced by the frequency vectors of the protein sequences. Three set of features are compared, the first set is based on the indices that are the most common in the interacting proteins, the second set is based on the indices that tend to be common in the interacting and non-interacting proteins, and the third set is constructed by using random indices. Moreover, three encoding strategies are compared; that are based on the amino asides polarity, structure, and chemical properties. The experimental results indicate that the highest accuracy can be obtained by using random indices with chemical properties encoding strategy and support vector machine.

Meta Model Based EA for Complex Optimization

Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, many real life optimization problems often require finding optimal solution to complex high dimensional, multimodal problems involving computationally very expensive fitness function evaluations. Use of evolutionary algorithms in such problem domains is thus practically prohibitive. An attractive alternative is to build meta models or use an approximation of the actual fitness functions to be evaluated. These meta models are order of magnitude cheaper to evaluate compared to the actual function evaluation. Many regression and interpolation tools are available to build such meta models. This paper briefly discusses the architectures and use of such meta-modeling tools in an evolutionary optimization context. We further present two evolutionary algorithm frameworks which involve use of meta models for fitness function evaluation. The first framework, namely the Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model [14] reduces computation time by controlled use of meta-models (in this case approximate model generated by Support Vector Machine regression) to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the metamodel are generated from a single uniform model. This does not take into account uncertain scenarios involving noisy fitness functions. The second model, DAFHEA-II, an enhanced version of the original DAFHEA framework, incorporates a multiple-model based learning approach for the support vector machine approximator to handle noisy functions [15]. Empirical results obtained by evaluating the frameworks using several benchmark functions demonstrate their efficiency