Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Interactive Garments: Flexible Technologies for Textile Integration

Upon reviewing the literature and the pragmatic work done in the field of E- textiles, it is observed that the applications of wearable technologies have found a steady growth in the field of military, medical, industrial, sports; whereas fashion is at a loss to know how to treat this technology and bring it to market. The purpose of this paper is to understand the practical issues of integration of electronics in garments; cutting patterns for mass production, maintaining the basic properties of textiles and daily maintenance of garments that hinder the wide adoption of interactive fabric technology within Fashion and leisure wear. To understand the practical hindrances an experimental and laboratory approach is taken. “Techno Meets Fashion” has been an interactive fashion project where sensor technologies have been embedded with textiles that result in set of ensembles that are light emitting garments, sound sensing garments, proximity garments, shape memory garments etc. Smart textiles, especially in the form of textile interfaces, are drastically underused in fashion and other lifestyle product design. Clothing and some other textile products must be washable, which subjects to the interactive elements to water and chemical immersion, physical stress, and extreme temperature. The current state of the art tends to be too fragile for this treatment. The process for mass producing traditional textiles becomes difficult in interactive textiles. As cutting patterns from larger rolls of cloth and sewing them together to make garments breaks and reforms electronic connections in an uncontrolled manner. Because of this, interactive fabric elements are integrated by hand into textiles produced by standard methods. The Arduino has surely made embedding electronics into textiles much easier than before; even then electronics are not integral to the daily wear garments. Soft and flexible interfaces of MEMS (micro sensors and Micro actuators) can be an option to make this possible by blending electronics within E-textiles in a way that’s seamless and still retains functions of the circuits as well as the garment. Smart clothes, which offer simultaneously a challenging design and utility value, can be only mass produced if the demands of the body are taken care of i.e. protection, anthropometry, ergonomics of human movement, thermo- physiological regulation.

Analysis of Diverse Cluster Ensemble Techniques

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Multi-Channel Information Fusion in C-OTDR Monitoring Systems: Various Approaches to Classify of Targeted Events

The paper presents new results concerning selection of optimal information fusion formula for ensembles of C-OTDR channels. The goal of information fusion is to create an integral classificator designed for effective classification of seismoacoustic target events. The LPBoost (LP-β and LP-B variants), the Multiple Kernel Learning, and Weighing of Inversely as Lipschitz Constants (WILC) approaches were compared. The WILC is a brand new approach to optimal fusion of Lipschitz Classifiers Ensembles. Results of practical usage are presented.

A Comparative Study of Malware Detection Techniques Using Machine Learning Methods

In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.

Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems

This paper introduces an original method for guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers is relevant in multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.

FEM Models of Glued Laminated Timber Beams Enhanced by Bayesian Updating of Elastic Moduli

Two finite element (FEM) models are presented in this paper to address the random nature of the response of glued timber structures made of wood segments with variable elastic moduli evaluated from 3600 indentation measurements. This total database served to create the same number of ensembles as was the number of segments in the tested beam. Statistics of these ensembles were then assigned to given segments of beams and the Latin Hypercube Sampling (LHS) method was called to perform 100 simulations resulting into the ensemble of 100 deflections subjected to statistical evaluation. Here, a detailed geometrical arrangement of individual segments in the laminated beam was considered in the construction of two-dimensional FEM model subjected to in fourpoint bending to comply with the laboratory tests. Since laboratory measurements of local elastic moduli may in general suffer from a significant experimental error, it appears advantageous to exploit the full scale measurements of timber beams, i.e. deflections, to improve their prior distributions with the help of the Bayesian statistical method. This, however, requires an efficient computational model when simulating the laboratory tests numerically. To this end, a simplified model based on Mindlin’s beam theory was established. The improved posterior distributions show that the most significant change of the Young’s modulus distribution takes place in laminae in the most strained zones, i.e. in the top and bottom layers within the beam center region. Posterior distributions of moduli of elasticity were subsequently utilized in the 2D FEM model and compared with the original simulations.

An Experimental Study of a Self-Supervised Classifier Ensemble

Learning using labeled and unlabelled data has received considerable amount of attention in the machine learning community due its potential in reducing the need for expensive labeled data. In this work we present a new method for combining labeled and unlabeled data based on classifier ensembles. The model we propose assumes each classifier in the ensemble observes the input using different set of features. Classifiers are initially trained using some labeled samples. The trained classifiers learn further through labeling the unknown patterns using a teaching signals that is generated using the decision of the classifier ensemble, i.e. the classifiers self-supervise each other. Experiments on a set of object images are presented. Our experiments investigate different classifier models, different fusing techniques, different training sizes and different input features. Experimental results reveal that the proposed self-supervised ensemble learning approach reduces classification error over the single classifier and the traditional ensemble classifier approachs.

Improving Academic Performance Prediction using Voting Technique in Data Mining

In this paper we compare the accuracy of data mining methods to classifying students in order to predicting student-s class grade. These predictions are more useful for identifying weak students and assisting management to take remedial measures at early stages to produce excellent graduate that will graduate at least with second class upper. Firstly we examine single classifiers accuracy on our data set and choose the best one and then ensembles it with a weak classifier to produce simple voting method. We present results show that combining different classifiers outperformed other single classifiers for predicting student performance.

Ensembling Classifiers – An Application toImage Data Classification from Cherenkov Telescope Experiment

Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques with classifiers such as random forests, neural networks and support vector machines. The data sets are from MAGIC, a Cherenkov telescope experiment. The task is to classify gamma signals from overwhelmingly hadron and muon signals representing a rare class classification problem. We compare the individual classifiers with their ensemble counterparts and discuss the results. WEKA a wonderful tool for machine learning has been used for making the experiments.

Combining Bagging and Additive Regression

Bagging and boosting are among the most popular re-sampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noise-free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using an averaging methodology of bagging and boosting ensembles with 10 sub-learners in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-learners on standard benchmark datasets and the proposed ensemble gave better accuracy.

Combining Bagging and Boosting

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using a voting methodology of bagging and boosting ensembles with 10 subclassifiers in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-classifiers, as well as other well known combining methods, on standard benchmark datasets and the proposed technique was the most accurate.

Meta Random Forests

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

A Survey: Clustering Ensembles Techniques

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

Ensembling Adaptively Constructed Polynomial Regression Models

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.