Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Towards an Integrated Proposal for Performance Measurement Indicators (Financial and Operational) in Advanced Production Practices

Starting with an analysis of the financial and operational indicators that can be found in the specialised literature, this study aims to contribute to improvements in the performance measurement systems used when the unit of analysis is the manufacturing plant. For this a search was done in the highest impact Journals of Production and Operations Management and Management Accounting , with the aim of determining the financial and operational indicators used to evaluate performance when Advanced Production Practices have been implemented, more specifically when the practices implemented are Total Quality Management, JIT/Lean Manufacturing and Total Productive Maintenance. This has enabled us to obtain a classification of the two types of indicators based on how much each is used. For the financial indicators we have also prepared a proposal that can be adapted to manufacturing plants- accounting features. In the near future we will propose a model that links practices implementation with financial and operational indicators and these two last with each other. We aim to will test this model empirically with the data obtained in the High Performance Manufacturing Project.

Efficient Solution for a Class of Markov Chain Models of Tandem Queueing Networks

We present a new numerical method for the computation of the steady-state solution of Markov chains. Theoretical analyses show that the proposed method, with a contraction factor α, converges to the one-dimensional null space of singular linear systems of the form Ax = 0. Numerical experiments are used to illustrate the effectiveness of the proposed method, with applications to a class of interesting models in the domain of tandem queueing networks.

Influences of Thermal Relaxation Times on Generalized Thermoelastic Longitudinal Waves in Circular Cylinder

This paper is concerned with propagation of thermoelastic longitudinal vibrations of an infinite circular cylinder, in the context of the linear theory of generalized thermoelasticity with two relaxation time parameters (Green and Lindsay theory). Three displacement potential functions are introduced to uncouple the equations of motion. The frequency equation, by using the traction free boundary conditions, is given in the form of a determinant involving Bessel functions. The roots of the frequency equation give the value of the characteristic circular frequency as function of the wave number. These roots, which correspond to various modes, are numerically computed and presented graphically for different values of the thermal relaxation times. It is found that the influences of the thermal relaxation times on the amplitudes of the elastic and thermal waves are remarkable. Also, it is shown in this study that the propagation of thermoelastic longitudinal vibrations based on the generalized thermoelasticity can differ significantly compared with the results under the classical formulation. A comparison of the results for the case with no thermal effects shows well agreement with some of the corresponding earlier results.

Strength Characteristics of Shallow Gassy Sand in the Hangzhou Bay

In view of geological origin, formation of the shallow gas reservoir of the Hangzhou Bay, northern Zhejiang Province, eastern China, and original occurrence characteristics of the gassy sand are analyzed. Generally, gassy sand in scale gas reservoirs is in the state of residual moisture content and the approximate scope of initial matric suction of sand ranges about from 0kPa to100kPa. Results based on GDS triaxial tests show that the classical shear strength formulas of unsaturated soil can not effectively describe basic strength characteristics of gassy sand; the relationship between apparent cohesion and matric suction of gassy sand agrees well with the power function, which can reasonably be used to describe the strength of gassy sand. In the stress path of gas release, shear strength of gassy sand will increase and experimental results show the formula proposed in this paper can effectively predict the strength increment. When saturated strength indexes of the sand are used in engineering design, moderate reduction should be considered.

Integration of Support Vector Machine and Bayesian Neural Network for Data Mining and Classification

Several combinations of the preprocessing algorithms, feature selection techniques and classifiers can be applied to the data classification tasks. This study introduces a new accurate classifier, the proposed classifier consist from four components: Signal-to- Noise as a feature selection technique, support vector machine, Bayesian neural network and AdaBoost as an ensemble algorithm. To verify the effectiveness of the proposed classifier, seven well known classifiers are applied to four datasets. The experiments show that using the suggested classifier enhances the classification rates for all datasets.

Protein Graph Partitioning by Mutually Maximization of cycle-distributions

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

Integrating Decision Tree and Spatial Cluster Analysis for Landslide Susceptibility Zonation

Landslide susceptibility map delineates the potential zones for landslide occurrence. Previous works have applied multivariate methods and neural networks for mapping landslide susceptibility. This study proposed a new approach to integrate decision tree model and spatial cluster statistic for assessing landslide susceptibility spatially. A total of 2057 landslide cells were digitized for developing the landslide decision tree model. The relationships of landslides and instability factors were explicitly represented by using tree graphs in the model. The local Getis-Ord statistics were used to cluster cells with high landslide probability. The analytic result from the local Getis-Ord statistics was classed to create a map of landslide susceptibility zones. The map was validated using new landslide data with 482 cells. Results of validation show an accuracy rate of 86.1% in predicting new landslide occurrence. This indicates that the proposed approach is useful for improving landslide susceptibility mapping.

Categorical Data Modeling: Logistic Regression Software

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Classification of the Bachet Elliptic Curves y2 = x3 + a3 in Fp, where p ≡ 1 (mod 6) is Prime

In this work, we first give in what fields Fp, the cubic root of unity lies in F*p, in Qp and in K*p where Qp and K*p denote the sets of quadratic and non-zero cubic residues modulo p. Then we use these to obtain some results on the classification of the Bachet elliptic curves y2 ≡ x3 +a3 modulo p, for p ≡ 1 (mod 6) is prime.

Low Voltage High Gain Linear Class AB CMOS OTA with DC Level Input Stage

This paper presents a low-voltage low-power differential linear transconductor with near rail-to-rail input swing. Based on the current-mirror OTA topology, the proposed transconductor combines the Flipped Voltage Follower (FVF) technique to linearize the transconductor behavior that leads to class- AB linear operation and the virtual transistor technique to lower the effective threshold voltages of the transistors which offers an advantage in terms of low supply requirement. Design of the OTA has been discussed. It operates at supply voltages of about ±0.8V. Simulation results for 0.18μm TSMC CMOS technology show a good input range of 1Vpp with a high DC gain of 81.53dB and a total harmonic distortion of -40dB at 1MHz for an input of 1Vpp. The main aim of this paper is to present and compare new OTA design with high transconductance, which has a potential to be used in low voltage applications.

Support Vector Machine Approach for Classification of Cancerous Prostate Regions

The objective of this paper, is to apply support vector machine (SVM) approach for the classification of cancerous and normal regions of prostate images. Three kinds of textural features are extracted and used for the analysis: parameters of the Gauss- Markov random field (GMRF), correlation function and relative entropy. Prostate images are acquired by the system consisting of a microscope, video camera and a digitizing board. Cross-validated classification over a database of 46 images is implemented to evaluate the performance. In SVM classification, sensitivity and specificity of 96.2% and 97.0% are achieved for the 32x32 pixel block sized data, respectively, with an overall accuracy of 96.6%. Classification performance is compared with artificial neural network and k-nearest neighbor classifiers. Experimental results demonstrate that the SVM approach gives the best performance.

Automating the Testing of Object Behaviour: A Statechart-Driven Approach

The evolution of current modeling specifications gives rise to the problem of generating automated test cases from a variety of application tools. Past endeavours on behavioural testing of UML statecharts have not systematically leveraged the potential of existing graph theory for testing of objects. Therefore there exists a need for a simple, tool-independent, and effective method for automatic test generation. An architecture, codenamed ACUTE-J (Automated stateChart Unit Testing Engine for Java), for automating the unit test generation process is presented. A sequential approach for converting UML statechart diagrams to JUnit test classes is described, with the application of existing graph theory. Research byproducts such as a universal XML Schema and API for statechart-driven testing are also proposed. The result from a Java implementation of ACUTE-J is discussed in brief. The Chinese Postman algorithm is utilised as an illustration for a run-through of the ACUTE-J architecture.

The Design of the HL7 RIM-based Sharing Components for Clinical Information Systems

The American Health Level Seven (HL7) Reference Information Model (RIM) consists of six back-bone classes that have different specialized attributes. Furthermore, for the purpose of enforcing the semantic expression, there are some specific mandatory vocabulary domains have been defined for representing the content values of some attributes. In the light of the fact that it is a duplicated effort on spending a lot of time and human cost to develop and modify Clinical Information Systems (CIS) for most hospitals due to the variety of workflows. This study attempts to design and develop sharing RIM-based components of the CIS for the different business processes. Therefore, the CIS contains data of a consistent format and type. The programmers can do transactions with the RIM-based clinical repository by the sharing RIM-based components. And when developing functions of the CIS, the sharing components also can be adopted in the system. These components not only satisfy physicians- needs in using a CIS but also reduce the time of developing new components of a system. All in all, this study provides a new viewpoint that integrating the data and functions with the business processes, it is an easy and flexible approach to build a new CIS.

Equivalence Class Subset Algorithm

The equivalence class subset algorithm is a powerful tool for solving a wide variety of constraint satisfaction problems and is based on the use of a decision function which has a very high but not perfect accuracy. Perfect accuracy is not required in the decision function as even a suboptimal solution contains valuable information that can be used to help find an optimal solution. In the hardest problems, the decision function can break down leading to a suboptimal solution where there are more equivalence classes than are necessary and which can be viewed as a mixture of good decision and bad decisions. By choosing a subset of the decisions made in reaching a suboptimal solution an iterative technique can lead to an optimal solution, using series of steadily improved suboptimal solutions. The goal is to reach an optimal solution as quickly as possible. Various techniques for choosing the decision subset are evaluated.

Combined Feature Based Hyperspectral Image Classification Technique Using Support Vector Machines

A spatial classification technique incorporating a State of Art Feature Extraction algorithm is proposed in this paper for classifying a heterogeneous classes present in hyper spectral images. The classification accuracy can be improved if and only if both the feature extraction and classifier selection are proper. As the classes in the hyper spectral images are assumed to have different textures, textural classification is entertained. Run Length feature extraction is entailed along with the Principal Components and Independent Components. A Hyperspectral Image of Indiana Site taken by AVIRIS is inducted for the experiment. Among the original 220 bands, a subset of 120 bands is selected. Gray Level Run Length Matrix (GLRLM) is calculated for the selected forty bands. From GLRLMs the Run Length features for individual pixels are calculated. The Principle Components are calculated for other forty bands. Independent Components are calculated for next forty bands. As Principal & Independent Components have the ability to represent the textural content of pixels, they are treated as features. The summation of Run Length features, Principal Components, and Independent Components forms the Combined Features which are used for classification. SVM with Binary Hierarchical Tree is used to classify the hyper spectral image. Results are validated with ground truth and accuracies are calculated.

Theory of Nanowire Radial p-n-Junction

We have developed an analytic model for the radial pn-junction in a nanowire (NW) core-shell structure utilizing as a new building block in different semiconductor devices. The potential distribution through the p-n-junction is calculated and the analytical expressions are derived to compute the depletion region widths. We show that the widths of space charge layers, surrounding the core, are the functions of core radius, which is the manifestation of so called classical size effect. The relationship between the depletion layer width and the built-in potential in the asymptotes of infinitely large core radius transforms to square-root dependence specific for conventional planar p-n-junctions. The explicit equation is derived to compute the capacitance of radial p-n-junction. The current-voltage behavior is also carefully determined taking into account the “short base" effects.

Wavelet - Based Classification of Outdoor Natural Scenes by Resilient Neural Network

Natural outdoor scene classification is active and promising research area around the globe. In this study, the classification is carried out in two phases. In the first phase, the features are extracted from the images by wavelet decomposition method and stored in a database as feature vectors. In the second phase, the neural classifiers such as back-propagation neural network (BPNN) and resilient back-propagation neural network (RPNN) are employed for the classification of scenes. Four hundred color images are considered from MIT database of two classes as forest and street. A comparative study has been carried out on the performance of the two neural classifiers BPNN and RPNN on the increasing number of test samples. RPNN showed better classification results compared to BPNN on the large test samples.

Emotion Recognition Using Neural Network: A Comparative Study

Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time

On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.