A Hybrid Gene Selection Technique Using Improved Mutual Information and Fisher Score for Cancer Classification Using Microarrays

Feature Selection is significant in order to perform constructive classification in the area of cancer diagnosis. However, a large number of features compared to the number of samples makes the task of classification computationally very hard and prone to errors in microarray gene expression datasets. In this paper, we present an innovative method for selecting highly informative gene subsets of gene expression data that effectively classifies the cancer data into tumorous and non-tumorous. The hybrid gene selection technique comprises of combined Mutual Information and Fisher score to select informative genes. The gene selection is validated by classification using Support Vector Machine (SVM) which is a supervised learning algorithm capable of solving complex classification problems. The results obtained from improved Mutual Information and F-Score with SVM as a classifier has produced efficient results.

Automatic Detection of Defects in Ornamental Limestone Using Wavelets

A methodology based on wavelets is proposed for the automatic location and delimitation of defects in limestone plates. Natural defects include dark colored spots, crystal zones trapped in the stone, areas of abnormal contrast colors, cracks or fracture lines, and fossil patterns. Although some of these may or may not be considered as defects according to the intended use of the plate, the goal is to pair each stone with a map of defects that can be overlaid on a computer display. These layers of defects constitute a database that will allow the preliminary selection of matching tiles of a particular variety, with specific dimensions, for a requirement of N square meters, to be done on a desktop computer rather than by a two-hour search in the storage park, with human operators manipulating stone plates as large as 3 m x 2 m, weighing about one ton. Accident risks and work times are reduced, with a consequent increase in productivity. The base for the algorithm is wavelet decomposition executed in two instances of the original image, to detect both hypotheses – dark and clear defects. The existence and/or size of these defects are the gauge to classify the quality grade of the stone products. The tuning of parameters that are possible in the framework of the wavelets corresponds to different levels of accuracy in the drawing of the contours and selection of the defects size, which allows for the use of the map of defects to cut a selected stone into tiles with minimum waste, according the dimension of defects allowed.

Selection of Designs in Ordinal Regression Models under Linear Predictor Misspecification

The purpose of this article is to find a method of comparing designs for ordinal regression models using quantile dispersion graphs in the presence of linear predictor misspecification. The true relationship between response variable and the corresponding control variables are usually unknown. Experimenter assumes certain form of the linear predictor of the ordinal regression models. The assumed form of the linear predictor may not be correct always. Thus, the maximum likelihood estimates (MLE) of the unknown parameters of the model may be biased due to misspecification of the linear predictor. In this article, the uncertainty in the linear predictor is represented by an unknown function. An algorithm is provided to estimate the unknown function at the design points where observations are available. The unknown function is estimated at all points in the design region using multivariate parametric kriging. The comparison of the designs are based on a scalar valued function of the mean squared error of prediction (MSEP) matrix, which incorporates both variance and bias of the prediction caused by the misspecification in the linear predictor. The designs are compared using quantile dispersion graphs approach. The graphs also visually depict the robustness of the designs on the changes in the parameter values. Numerical examples are presented to illustrate the proposed methodology.

A Spanning Tree for Enhanced Cluster Based Routing in Wireless Sensor Network

Wireless Sensor Network (WSN) clustering architecture enables features like network scalability, communication overhead reduction, and fault tolerance. After clustering, aggregated data is transferred to data sink and reducing unnecessary, redundant data transfer. It reduces nodes transmitting, and so saves energy consumption. Also, it allows scalability for many nodes, reduces communication overhead, and allows efficient use of WSN resources. Clustering based routing methods manage network energy consumption efficiently. Building spanning trees for data collection rooted at a sink node is a fundamental data aggregation method in sensor networks. The problem of determining Cluster Head (CH) optimal number is an NP-Hard problem. In this paper, we combine cluster based routing features for cluster formation and CH selection and use Minimum Spanning Tree (MST) for intra-cluster communication. The proposed method is based on optimizing MST using Simulated Annealing (SA). In this work, normalized values of mobility, delay, and remaining energy are considered for finding optimal MST. Simulation results demonstrate the effectiveness of the proposed method in improving the packet delivery ratio and reducing the end to end delay.

Morphological Parameters and Selection of Turkish Edible Seed Pumpkins (Cucurbita pepo L.) Germplasm

There is a requirement for registered edible seed pumpkin suitable for eating in Turkey. A total of 81 genotypes collected from the researchers in 2005 originated from Eskisehir, Konya, Nevsehir, Tekirdag, Sakarya, Kayseri and Kirsehir provinces were utilized. The used genetic materials were brought to S5 generation by the research groups among 2006 and 2010 years. In this research, S5 stage reached in the genotype given some of the morphological features, and selection of promising genotypes generated scale were made. Results showed that the A-1 (420), A-7 (410), A-8 (420), A-32 (420), B-17 (410), B-24 (410), B-25 (420), B-33 (400), C-24 (420), C-25 (410), C-26 (410) and C-30 (420) genotypes are expected to be promising varieties.

Measuring Enterprise Growth: Pitfalls and Implications

Enterprise growth is generally considered as a key driver of competitiveness, employment, economic development and social inclusion. As such, it is perceived to be a highly desirable outcome of entrepreneurship for scholars and decision makers. The huge academic debate resulted in the multitude of theoretical frameworks focused on explaining growth stages, determinants and future prospects. It has been widely accepted that enterprise growth is most likely nonlinear, temporal and related to the variety of factors which reflect the individual, firm, organizational, industry or environmental determinants of growth. However, factors that affect growth are not easily captured, instruments to measure those factors are often arbitrary, causality between variables and growth is elusive, indicating that growth is not easily modeled. Furthermore, in line with heterogeneous nature of the growth phenomenon, there is a vast number of measurement constructs assessing growth which are used interchangeably. Differences among various growth measures, at conceptual as well as at operationalization level, can hinder theory development which emphasizes the need for more empirically robust studies. In line with these highlights, the main purpose of this paper is twofold. Firstly, to compare structure and performance of three growth prediction models based on the main growth measures: Revenues, employment and assets growth. Secondly, to explore the prospects of financial indicators, set as exact, visible, standardized and accessible variables, to serve as determinants of enterprise growth. Finally, to contribute to the understanding of the implications on research results and recommendations for growth caused by different growth measures. The models include a range of financial indicators as lag determinants of the enterprises’ performances during the 2008-2013, extracted from the national register of the financial statements of SMEs in Croatia. The design and testing stage of the modeling used the logistic regression procedures. Findings confirm that growth prediction models based on different measures of growth have different set of predictors. Moreover, the relationship between particular predictors and growth measure is inconsistent, namely the same predictor positively related to one growth measure may exert negative effect on a different growth measure. Overall, financial indicators alone can serve as good proxy of growth and yield adequate predictive power of the models. The paper sheds light on both methodology and conceptual framework of enterprise growth by using a range of variables which serve as a proxy for the multitude of internal and external determinants, but are unlike them, accessible, available, exact and free of perceptual nuances in building up the model. Selection of the growth measure seems to have significant impact on the implications and recommendations related to growth. Furthermore, the paper points out to potential pitfalls of measuring and predicting growth. Overall, the results and the implications of the study are relevant for advancing academic debates on growth-related methodology, and can contribute to evidence-based decisions of policy makers.

Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Technology Identification, Evaluation and Selection Methodology for Industrial Process Water and Waste Water Treatment Plant of 3x150 MWe Tufanbeyli Lignite-Fired Power Plant

Most thermal power plants use steam as working fluid in their power cycle. Therefore, in addition to fuel, water is the other main input for thermal plants. Water and steam must be highly pure in order to protect the systems from corrosion, scaling and biofouling. Pure process water is produced in water treatment plants having many several treatment methods. Treatment plant design is selected depending on raw water source and required water quality. Although working principle of fossil-fuel fired thermal power plants are same, there is no standard design and equipment arrangement valid for all thermal power plant utility systems. Besides that, there are many other technology evaluation and selection criteria for designing the most optimal water systems meeting the requirements such as local conditions, environmental restrictions, electricity and other consumables availability and transport, process water sources and scarcity, land use constraints etc. Aim of this study is explaining the adopted methodology for technology selection for process water preparation and industrial waste water treatment plant in a thermal power plant project located in Tufanbeyli, Adana Province in Turkey. Thermal power plant is fired with indigenous lignite coal extracted from adjacent lignite reserves. This paper addresses all above-mentioned factors affecting the thermal power plant water treatment facilities (demineralization + waste water treatment) design and describes the ultimate design of Tufanbeyli Thermal Power Plant Water Treatment Plant.

Cost Sensitive Feature Selection in Decision-Theoretic Rough Set Models for Customer Churn Prediction: The Case of Telecommunication Sector Customers

In recent days, there is a change and the ongoing development of the telecommunications sector in the global market. In this sector, churn analysis techniques are commonly used for analysing why some customers terminate their service subscriptions prematurely. In addition, customer churn is utmost significant in this sector since it causes to important business loss. Many companies make various researches in order to prevent losses while increasing customer loyalty. Although a large quantity of accumulated data is available in this sector, their usefulness is limited by data quality and relevance. In this paper, a cost-sensitive feature selection framework is developed aiming to obtain the feature reducts to predict customer churn. The framework is a cost based optional pre-processing stage to remove redundant features for churn management. In addition, this cost-based feature selection algorithm is applied in a telecommunication company in Turkey and the results obtained with this algorithm.

Material Selection for a Manual Winch Rope Drum

The selection of materials is an essential task in mechanical design processes. This paper sets out to demonstrate the application of analytical decision making during mechanical design and, particularly, in selecting a suitable material for a given application. Equations for the mechanical design of a manual winch rope drum are used to derive quantitative material performance indicators, which are then used in a multiple attribute decision making (MADM) model to rank the candidate materials. Thus, the processing of mechanical design considerations and material properties data into information that is suitable for use in a quantitative materials selection process is demonstrated for the case of a rope drum design. Moreover, Microsoft Excel®, a commonly available computer package, is used in the selection process. The results of the materials selection process are in agreement with current industry practice in rope drum design. The procedure that is demonstrated here should be adaptable to other design situations in which a need arises for the selection of engineering materials, and other engineering entities.

Material Selection for Footwear Insole Using Analytical Hierarchal Process

Product performance depends on the type and quality of its building material. Successful product must be made using high quality material, and using the right methods. Many foot problems took place as a result of using poor insole material. Therefore, selecting a proper insole material is crucial to eliminate these problems. In this study, the analytical hierarchy process (AHP) is used to provide a systematic procedure for choosing the best material adequate for this application among three material alternatives (polyurethane, poron, and plastzote). Several comparison criteria are used to build the AHP model including: density, stiffness, durability, energy absorption, and ease of fabrication. Poron was selected as the best choice. Inconsistency testing indicates that the model is reasonable, and the materials alternative ranking is effective.

Multiclass Support Vector Machines with Simultaneous Multi-Factors Optimization for Corporate Credit Ratings

Corporate credit rating prediction is one of the most important topics, which has been studied by researchers in the last decade. Over the last decade, researchers are pushing the limit to enhance the exactness of the corporate credit rating prediction model by applying several data-driven tools including statistical and artificial intelligence methods. Among them, multiclass support vector machine (MSVM) has been widely applied due to its good predictability. However, heuristics, for example, parameters of a kernel function, appropriate feature and instance subset, has become the main reason for the critics on MSVM, as they have dictate the MSVM architectural variables. This study presents a hybrid MSVM model that is intended to optimize all the parameter such as feature selection, instance selection, and kernel parameter. Our model adopts genetic algorithm (GA) to simultaneously optimize multiple heterogeneous design factors of MSVM.

Selection of Relevant Servers in Distributed Information Retrieval System

Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.

Breast Cancer Survivability Prediction via Classifier Ensemble

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

The Impact of Modeling Method of Moisture Emission from the Swimming Pool on the Accuracy of Numerical Calculations of Air Parameters in Ventilated Natatorium

The aim of presented research was to improve numerical predictions of air parameters distribution in the actual natatorium by the selection of calculation formula of mass flux of moisture emitted from the pool. Selected correlation should ensure the best compliance of numerical results with the measurements' results of these parameters in the facility. The numerical model of the natatorium was developed, for which boundary conditions were prepared on the basis of measurements' results carried out in the actual facility. Numerical calculations were carried out with the use of ANSYS CFX software, with six formulas being implemented, which in various ways made the moisture emission dependent on water surface temperature and air parameters in the natatorium. The results of calculations with the use of these formulas were compared for air parameters' distributions: Specific humidity, velocity and temperature in the facility. For the selection of the best formula, numerical results of these parameters in occupied zone were validated by comparison with the measurements' results carried out at selected points of this zone.

Applying Theory of Inventive Problem Solving to Develop Innovative Solutions: A Case Study

Good service design can increase organization revenue and consumer satisfaction while reducing labor and time costs. The problems facing consumers in the original serve model for eyewear and optical industry includes the following issues: 1. Insufficient information on eyewear products 2. Passively dependent on recommendations, insufficient selection 3. Incomplete records on progression of vision conditions 4. Lack of complete customer records. This study investigates the case of Kobayashi Optical, applying the Theory of Inventive Problem Solving (TRIZ) to develop innovative solutions for eyewear and optical industry. Analysis results raise the following conclusions and management implications: In order to provide customers with improved professional information and recommendations, Kobayashi Optical is suggested to establish customer purchasing records. Overall service efficiency can be enhanced by applying data mining techniques to analyze past consumer preferences and purchase histories. Furthermore, Kobayashi Optical should continue to develop a 3D virtual trial service which can allow customers for easy browsing of different frame styles and colors. This 3D virtual trial service will save customer waiting times in during peak service times at stores.

Issues in Organizational Assessment: The Case of Frustration Tolerance Measurement in Mexico

The psychological profile has become one of the most important sources of information when it comes to individual selection and the hiring process in any organization. Psychological instruments are used to collect data about variables that are considered critically important for performance in work. However, because of conceptual chaos in organizational psychology, most of the information provided by psychological testing is not directly useful for Mexican human resources professionals to take hiring decisions. The aims of this paper are 1) to underline the lack of conceptual precision in theoretical testing foundations in Mexico and 2) presenting a reliability and validity analysis of a frustration tolerance instrument created as an alternative to a heuristically conduct individual assessment in organizations. First, a description of assessment conditions in Mexico is made. Second, an instrument and a theoretical framework is presented as an alternative to the assessment practices in the country. A total of 65 Psychology Iztacala Superior Studies Faculty students were assessed. Cronbach´s alpha coefficient was calculated and an exploratory factor analysis was carried out to prove the scale unidimensionality. Reliability analysis revealed good internal consistency of the scale (Cronbach’s α = 0.825). Factor analysis produced 4 factors for the scale. However, factor loadings and explained variation give proof to the scale unidimensionality. It is concluded that the instrument has good psychometric properties that will allow human resources professionals to collect useful data. Different possibilities to conduct psychological assessment are suggested for future development.

A Large Ion Collider Experiment (ALICE) Diffractive Detector Control System for RUN-II at the Large Hadron Collider

The selection of diffractive events in the ALICE experiment during the first data taking period (RUN-I) of the Large Hadron Collider (LHC) was limited by the range over which rapidity gaps occur. It would be possible to achieve better measurements by expanding the range in which the production of particles can be detected. For this purpose, the ALICE Diffractive (AD0) detector has been installed and commissioned for the second phase (RUN-II). Any new detector should be able to take the data synchronously with all other detectors and be operated through the ALICE central systems. One of the key elements that must be developed for the AD0 detector is the Detector Control System (DCS). The DCS must be designed to operate safely and correctly this detector. Furthermore, the DCS must also provide optimum operating conditions for the acquisition and storage of physics data and ensure these are of the highest quality. The operation of AD0 implies the configuration of about 200 parameters, from electronics settings and power supply levels to the archiving of operating conditions data and the generation of safety alerts. It also includes the automation of procedures to get the AD0 detector ready for taking data in the appropriate conditions for the different run types in ALICE. The performance of AD0 detector depends on a certain number of parameters such as the nominal voltages for each photomultiplier tube (PMT), their threshold levels to accept or reject the incoming pulses, the definition of triggers, etc. All these parameters define the efficiency of AD0 and they have to be monitored and controlled through AD0 DCS. Finally, AD0 DCS provides the operator with multiple interfaces to execute these tasks. They are realized as operating panels and scripts running in the background. These features are implemented on a SCADA software platform as a distributed control system which integrates to the global control system of the ALICE experiment.

Solution of Logistics Center Selection Problem Using the Axiomatic Design Method

Logistics centers represent areas that all national and international logistics and activities related to logistics can be implemented by the various businesses. Logistics centers have a key importance in joining the transport stream and the transport system operations. Therefore, it is important where these centers are positioned to be effective and efficient and to show the expected performance of the centers. In this study, the location selection problem to position the logistics center is discussed. Alternative centers are evaluated according certain criteria. The most appropriate center is identified using the axiomatic design method.

Selection of Wind Farms to Add Virtual Inertia Control to Assist the Power System Frequency Regulation

Due to the randomness and uncertainty of wind energy, modern power systems integrating large-scale wind generation will be significantly impacted in terms of system performance and technical challenges. System inertia with high wind penetration is decreasing when conventional thermal generators are gradually replaced by wind turbines, which do not naturally contribute to inertia response. The power imbalance caused by wind power or demand fluctuations leads to the instability of system frequency. Accordingly, the need to attach the supplementary virtual inertia control to wind farms (WFs) strongly arises. When multi-wind farms are connected to the grid simultaneously, the selection of which critical WFs to install the virtual inertia control is greatly important to enhance the stability of system frequency. By building the small signal model of wind power systems considering frequency regulation, the installation locations are identified by the geometric measures of the mode observability of WFs. In addition, this paper takes the impacts of grid topology and selection of feedback control signals into consideration. Finally, simulations are conducted on a multi-wind farms power system and the results demonstrate that the designed virtual inertia control method can effectively assist the frequency regulation.