Investigations of Protein Aggregation Using Sequence and Structure Based Features

The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.

Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System

In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.

Effect of Personality Traits on Classification of Political Orientation

Today, there is a large number of political transcripts available on the Web to be mined and used for statistical analysis, and product recommendations. As the online political resources are used for various purposes, automatically determining the political orientation on these transcripts becomes crucial. The methodologies used by machine learning algorithms to do an automatic classification are based on different features that are classified under categories such as Linguistic, Personality etc. Considering the ideological differences between Liberals and Conservatives, in this paper, the effect of Personality traits on political orientation classification is studied. The experiments in this study were based on the correlation between LIWC features and the BIG Five Personality traits. Several experiments were conducted using Convote U.S. Congressional- Speech dataset with seven benchmark classification algorithms. The different methodologies were applied on several LIWC feature sets that constituted by 8 to 64 varying number of features that are correlated to five personality traits. As results of experiments, Neuroticism trait was obtained to be the most differentiating personality trait for classification of political orientation. At the same time, it was observed that the personality trait based classification methodology gives better and comparable results with the related work.

Educase – Intelligent System for Pedagogical Advising Using Case-Based Reasoning

This paper introduces a proposal scheme for an Intelligent System applied to Pedagogical Advising using Case-Based Reasoning, to find consolidated solutions before used for the new problems, making easier the task of advising students to the pedagogical staff. We do intend, through this work, introduce the motivation behind the choices for this system structure, justifying the development of an incremental and smart web system who learns bests solutions for new cases when it’s used, showing technics and technology.

Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

What the Future Holds for Social Media Data Analysis

The dramatic rise in the use of Social Media (SM) platforms such as Facebook and Twitter provide access to an unprecedented amount of user data. Users may post reviews on products and services they bought, write about their interests, share ideas or give their opinions and views on political issues. There is a growing interest in the analysis of SM data from organisations for detecting new trends, obtaining user opinions on their products and services or finding out about their online reputations. A recent research trend in SM analysis is making predictions based on sentiment analysis of SM. Often indicators of historic SM data are represented as time series and correlated with a variety of real world phenomena like the outcome of elections, the development of financial indicators, box office revenue and disease outbreaks. This paper examines the current state of research in the area of SM mining and predictive analysis and gives an overview of the analysis methods using opinion mining and machine learning techniques.

Resident-Aware Green Home

The amount of energy the world uses doubles every 20 years. Green homes play an important role in reducing the residential energy demand. This paper presents a platform that is intended to learn the behavior of home residents and build a profile about their habits and actions. The proposed resident aware home controller intervenes in the operation of home appliances in order to save energy without compromising the convenience of the residents. The presented platform can be used to simulate the actions and movements happening inside a home. The paper includes several optimization techniques that are meant to save energy in the home. In addition, several test scenarios are presented that show how the controller works. Moreover, this paper shows the computed actual savings when each of the presented techniques is implemented in a typical home. The test scenarios have validated that the techniques developed are capable of effectively saving energy at homes.

An Educational Data Mining System for Advising Higher Education Students

Educational  data mining  is  a  specific  data   mining field applied to data originating from educational environments, it relies on different  approaches to discover hidden knowledge  from  the  available   data. Among these approaches are   machine   learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems. In  our  research, we propose  a “Student  Advisory  Framework” that  utilizes  classification  and  clustering  to  build  an  intelligent system. This system can be used to provide pieces of consultations to a first year  university  student to  pursue a  certain   education   track   where  he/she  will  likely  succeed  in, aiming  to  decrease   the  high  rate   of  academic  failure   among these  students.  A real case study  in Cairo  Higher  Institute  for Engineering, Computer  Science  and  Management  is  presented using  real  dataset   collected  from  2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.

Modeling Language for Constructing Solvers in Machine Learning: Reductionist Perspectives

For a given specific problem an efficient algorithm has been the matter of study. However, an alternative approach orthogonal to this approach comes out, which is called a reduction. In general for a given specific problem this reduction approach studies how to convert an original problem into subproblems. This paper proposes a formal modeling language to support this reduction approach in order to make a solver quickly. We show three examples from the wide area of learning problems. The benefit is a fast prototyping of algorithms for a given new problem. It is noted that our formal modeling language is not intend for providing an efficient notation for data mining application, but for facilitating a designer who develops solvers in machine learning.

Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values

Estimates of temperature values at a specific time of day, from daytime and daily profiles, are needed for a number of environmental, ecological, agricultural and technical applications, ranging from natural hazards assessments, crop growth forecasting to design of solar energy systems. The scope of this research is to investigate the efficiency of data mining techniques in estimating minimum, maximum and mean temperature values. For this reason, a number of experiments have been conducted with well-known regression algorithms using temperature data from the city of Patras in Greece. The performance of these algorithms has been evaluated using standard statistical indicators, such as Correlation Coefficient, Root Mean Squared Error, etc.

MTSSM - A Framework for Multi-Track Segmentation of Symbolic Music

Music segmentation is a key issue in music information retrieval (MIR) as it provides an insight into the internal structure of a composition. Structural information about a composition can improve several tasks related to MIR such as searching and browsing large music collections, visualizing musical structure, lyric alignment, and music summarization. The authors of this paper present the MTSSM framework, a twolayer framework for the multi-track segmentation of symbolic music. The strength of this framework lies in the combination of existing methods for local track segmentation and the application of global structure information spanning via multiple tracks. The first layer of the MTSSM uses various string matching techniques to detect the best candidate segmentations for each track of a multi-track composition independently. The second layer combines all single track results and determines the best segmentation for each track in respect to the global structure of the composition.

Integrating Artificial Neural Network and Taguchi Method on Constructing the Real Estate Appraisal Model

In recent years, real estate prediction or valuation has been a topic of discussion in many developed countries. Improper hype created by investors leads to fluctuating prices of real estate, affecting many consumers to purchase their own homes. Therefore, scholars from various countries have conducted research in real estate valuation and prediction. With the back-propagation neural network that has been popular in recent years and the orthogonal array in the Taguchi method, this study aimed to find the optimal parameter combination at different levels of orthogonal array after the system presented different parameter combinations, so that the artificial neural network obtained the most accurate results. The experimental results also demonstrated that the method presented in the study had a better result than traditional machine learning. Finally, it also showed that the model proposed in this study had the optimal predictive effect, and could significantly reduce the cost of time in simulation operation. The best predictive results could be found with a fewer number of experiments more efficiently. Thus users could predict a real estate transaction price that is not far from the current actual prices.

A Cognitive Model of Character Recognition Using Support Vector Machines

In the present study, a support vector machine (SVM) learning approach to character recognition is proposed. Simple feature detectors, similar to those found in the human visual system, were used in the SVM classifier. Alphabetic characters were rotated to 8 different angles and using the proposed cognitive model, all characters were recognized with 100% accuracy and specificity. These same results were found in psychiatric studies of human character recognition.

Combining Diverse Neural Classifiers for Complex Problem Solving: An ECOC Approach

Combining classifiers is a useful method for solving complex problems in machine learning. The ECOC (Error Correcting Output Codes) method has been widely used for designing combining classifiers with an emphasis on the diversity of classifiers. In this paper, in contrast to the standard ECOC approach in which individual classifiers are chosen homogeneously, classifiers are selected according to the complexity of the corresponding binary problem. We use SATIMAGE database (containing 6 classes) for our experiments. The recognition error rate in our proposed method is %10.37 which indicates a considerable improvement in comparison with the conventional ECOC and stack generalization methods.

Using Interval Trees for Approximate Indexing of Instances

This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.

Anomaly Detection and Characterization to Classify Traffic Anomalies Case Study: TOT Public Company Limited Network

This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.

Hybrid Machine Learning Approach for Text Categorization

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.

An Anomaly Detection Approach to Detect Unexpected Faults in Recordings from Test Drives

In the automotive industry test drives are being conducted during the development of new vehicle models or as a part of quality assurance of series-production vehicles. The communication on the in-vehicle network, data from external sensors, or internal data from the electronic control units is recorded by automotive data loggers during the test drives. The recordings are used for fault analysis. Since the resulting data volume is tremendous, manually analysing each recording in great detail is not feasible. This paper proposes to use machine learning to support domainexperts by preventing them from contemplating irrelevant data and rather pointing them to the relevant parts in the recordings. The underlying idea is to learn the normal behaviour from available recordings, i.e. a training set, and then to autonomously detect unexpected deviations and report them as anomalies. The one-class support vector machine “support vector data description” is utilised to calculate distances of feature vectors. SVDDSUBSEQ is proposed as a novel approach, allowing to classify subsequences in multivariate time series data. The approach allows to detect unexpected faults without modelling effort as is shown with experimental results on recordings from test drives.

The Influence of Preprocessing Parameters on Text Categorization

Text categorization (the assignment of texts in natural language into predefined categories) is an important and extensively studied problem in Machine Learning. Currently, popular techniques developed to deal with this task include many preprocessing and learning algorithms, many of which in turn require tuning nontrivial internal parameters. Although partial studies are available, many authors fail to report values of the parameters they use in their experiments, or reasons why these values were used instead of others. The goal of this work then is to create a more thorough comparison of preprocessing parameters and their mutual influence, and report interesting observations and results.

Pruning Method of Belief Decision Trees

The belief decision tree (BDT) approach is a decision tree in an uncertain environment where the uncertainty is represented through the Transferable Belief Model (TBM), one interpretation of the belief function theory. The uncertainty can appear either in the actual class of training objects or attribute values of objects to classify. In this paper, we develop a post-pruning method of belief decision trees in order to reduce size and improve classification accuracy on unseen cases. The pruning of decision tree has a considerable intention in the areas of machine learning.