Abstract: Planning the order picking lists for warehouses to achieve some operational performances is a significant challenge when the costs associated with logistics are relatively high, and it is especially important in e-commerce era. Nowadays, many order planning techniques employ supervised machine learning algorithms. However, to define features for supervised machine learning algorithms is not a simple task. Against this background, we consider whether unsupervised algorithms can enhance the planning of order-picking lists. A double zone picking approach, which is based on using clustering algorithms twice, is developed. A simplified example is given to demonstrate the merit of our approach.
Abstract: For security purposes, it is important to detect passwords entered by unauthorized users. With traditional alphanumeric passwords, if the content of a password is acquired and correctly entered by an intruder, it is impossible to differentiate the password entered by the intruder from those entered by the authorized user because the password entries contain precisely the same character set. However, no two entries for the gesture-based passwords, even those entered by the person who created the password, will be identical. There are always variations between entries, such as the shape and length of each stroke, the location of each stroke, and the speed of drawing. It is possible that passwords entered by the unauthorized user contain higher levels of variations when compared with those entered by the authorized user (the creator). The difference in the levels of variations may provide cues to detect unauthorized entries. To test this hypothesis, we designed an empirical study, collected and analyzed the data with the help of machine-learning algorithms. The results of the study are significant.
Abstract: Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Abstract: A major challenge in medical studies, especially those that are longitudinal, is the problem of missing measurements which hinders the effective application of many machine learning algorithms. Furthermore, recent Alzheimer's Disease studies have focused on the delineation of Early Mild Cognitive Impairment (EMCI) and Late Mild Cognitive Impairment (LMCI) from cognitively normal controls (CN) which is essential for developing effective and early treatment methods. To address the aforementioned challenges, this paper explores the potential of using the eXtreme Gradient Boosting (XGBoost) algorithm in handling missing values in multiclass classification. We seek a generalized classification scheme where all prodromal stages of the disease are considered simultaneously in the classification and decision-making processes. Given the large number of subjects (1631) included in this study and in the presence of almost 28% missing values, we investigated the performance of XGBoost on the classification of the four classes of AD, NC, EMCI, and LMCI. Using 10-fold cross validation technique, XGBoost is shown to outperform other state-of-the-art classification algorithms by 3% in terms of accuracy and F-score. Our model achieved an accuracy of 80.52%, a precision of 80.62% and recall of 80.51%, supporting the more natural and promising multiclass classification.
Abstract: This aims of this paper is to forecast the electricity spot prices. First, we focus on modeling the conditional mean of the series so we adopt a generalized fractional -factor Gegenbauer process (k-factor GARMA). Secondly, the residual from the -factor GARMA model has used as a proxy for the conditional variance; these residuals were predicted using two different approaches. In the first approach, a local linear wavelet neural network model (LLWNN) has developed to predict the conditional variance using the Back Propagation learning algorithms. In the second approach, the Gegenbauer generalized autoregressive conditional heteroscedasticity process (G-GARCH) has adopted, and the parameters of the k-factor GARMA-G-GARCH model has estimated using the wavelet methodology based on the discrete wavelet packet transform (DWPT) approach. The empirical results have shown that the k-factor GARMA-G-GARCH model outperform the hybrid k-factor GARMA-LLWNN model, and find it is more appropriate for forecasts.
Abstract: This paper illustrates an application of granular computing approach, namely rough set theory in data mining. The paper outlines the formalism of granular computing and elucidates the mathematical underpinning of rough set theory, which has been widely used by the data mining and the machine learning community. A real-world application is illustrated, and the classification performance is compared with other contending machine learning algorithms. The predictive performance of the rough set rule induction model shows comparative success with respect to other contending algorithms.
Abstract: Human beings have the ability to make logical
decisions. Although human decision - making is often optimal, it is
insufficient when huge amount of data is to be classified. Medical
dataset is a vital ingredient used in predicting patient’s health
condition. In other to have the best prediction, there calls for most
suitable machine learning algorithms. This work compared the
performance of Artificial Neural Network (ANN) and Decision Tree
Algorithms (DTA) as regards to some performance metrics using
diabetes data. WEKA software was used for the implementation of
the algorithms. Multilayer Perceptron (MLP) and Radial Basis
Function (RBF) were the two algorithms used for ANN, while
RegTree and LADTree algorithms were the DTA models used. From
the results obtained, DTA performed better than ANN. The Root
Mean Squared Error (RMSE) of MLP is 0.3913 that of RBF is
0.3625, that of RepTree is 0.3174 and that of LADTree is 0.3206
respectively.
Abstract: As internet continues to expand its usage with an
enormous number of applications, cyber-threats have significantly
increased accordingly. Thus, accurate detection of malicious traffic in
a timely manner is a critical concern in today’s Internet for security.
One approach for intrusion detection is to use Machine Learning (ML)
techniques. Several methods based on ML algorithms have been
introduced over the past years, but they are largely limited in terms of
detection accuracy and/or time and space complexity to run. In this
work, we present a novel method for intrusion detection that
incorporates a set of supervised learning algorithms. The proposed
technique provides high accuracy and outperforms existing techniques
that simply utilizes a single learning method. In addition, our
technique relies on partial flow information (rather than full
information) for detection, and thus, it is light-weight and desirable for
online operations with the property of early identification. With the
mid-Atlantic CCDC intrusion dataset publicly available, we show that
our proposed technique yields a high degree of detection rate over 99%
with a very low false alarm rate (0.4%).
Abstract: Mobile agents are a powerful approach to develop distributed systems since they migrate to hosts on which they have the resources to execute individual tasks. In a dynamic environment like a peer-to-peer network, Agents have to be generated frequently and dispatched to the network. Thus they will certainly consume a certain amount of bandwidth of each link in the network if there are too many agents migration through one or several links at the same time, they will introduce too much transferring overhead to the links eventually, these links will be busy and indirectly block the network traffic, therefore, there is a need of developing routing algorithms that consider about traffic load. In this paper we seek to create cooperation between a probabilistic manner according to the quality measure of the network traffic situation and the agent's migration decision making to the next hop based on decision tree learning algorithms.
Abstract: Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.
Abstract: This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.
Abstract: In this paper, a comparative study of application of
supervised and unsupervised learning algorithms on illumination
invariant face recognition has been carried out. The supervised
learning has been carried out with the help of using a bi-layered
artificial neural network having one input, two hidden and one output
layer. The gradient descent with momentum and adaptive learning
rate back propagation learning algorithm has been used to implement
the supervised learning in a way that both the inputs and
corresponding outputs are provided at the time of training the
network, thus here is an inherent clustering and optimized learning of
weights which provide us with efficient results.. The unsupervised
learning has been implemented with the help of a modified
Counterpropagation network. The Counterpropagation network
involves the process of clustering followed by application of Outstar
rule to obtain the recognized face. The face recognition system has
been developed for recognizing faces which have varying
illumination intensities, where the database images vary in lighting
with respect to angle of illumination with horizontal and vertical
planes. The supervised and unsupervised learning algorithms have
been implemented and have been tested exhaustively, with and
without application of histogram equalization to get efficient results.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Abstract: Data mining (DM) is the process of finding and extracting frequent patterns that can describe the data, or predict unknown or future values. These goals are achieved by using various learning algorithms. Each algorithm may produce a mining result completely different from the others. Some algorithms may find millions of patterns. It is thus the difficult job for data analysts to select appropriate models and interpret the discovered knowledge. In this paper, we describe a framework of an intelligent and complete data mining system called SUT-Miner. Our system is comprised of a full complement of major DM algorithms, pre-DM and post-DM functionalities. It is the post-DM packages that ease the DM deployment for business intelligence applications.
Abstract: this paper presents a multi-context recurrent network for time series analysis. While simple recurrent network (SRN) are very popular among recurrent neural networks, they still have some shortcomings in terms of learning speed and accuracy that need to be addressed. To solve these problems, we proposed a multi-context recurrent network (MCRN) with three different learning algorithms. The performance of this network is evaluated on some real-world application such as handwriting recognition and energy load forecasting. We study the performance of this network and we compared it to a very well established SRN. The experimental results showed that MCRN is very efficient and very well suited to time series analysis and its applications.