Abstract: This paper deals with the random access procedure in next-generation networks and presents the solution to reduce total service time (TST) which is one of the most important performance metrics in current and future internet of things (IoT) based networks. The proposed solution focuses on the calculation of optimal transmission probability which maximizes the success probability and reduces TST. It uses the information of several idle preambles in every time slot, and based on it, it estimates the number of backlogged IoT devices using Naïve Bayes estimation which is a type of supervised learning in the machine learning domain. The estimation of backlogged devices is necessary since optimal transmission probability depends on it and the eNodeB does not have information about it. The simulations are carried out in MATLAB which verify that the proposed solution gives excellent performance.
Abstract: Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.
Abstract: Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.
Abstract: This paper presents an approach for easy creation and
classification of institutional risk profiles supporting endangerment
analysis of file formats. The main contribution of this work is the
employment of data mining techniques to support set up of the most
important risk factors. Subsequently, risk profiles employ risk factors
classifier and associated configurations to support digital preservation
experts with a semi-automatic estimation of endangerment group
for file format risk profiles. Our goal is to make use of an expert
knowledge base, accuired through a digital preservation survey
in order to detect preservation risks for a particular institution.
Another contribution is support for visualisation of risk factors for
a requried dimension for analysis. Using the naive Bayes method,
the decision support system recommends to an expert the matching
risk profile group for the previously selected institutional risk profile.
The proposed methods improve the visibility of risk factor values
and the quality of a digital preservation process. The presented
approach is designed to facilitate decision making for the preservation
of digital content in libraries and archives using domain expert
knowledge and values of file format risk profiles. To facilitate
decision-making, the aggregated information about the risk factors
is presented as a multidimensional vector. The goal is to visualise
particular dimensions of this vector for analysis by an expert and
to define its profile group. The sample risk profile calculation and
the visualisation of some risk factor dimensions is presented in the
evaluation section.
Abstract: Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.
Abstract: Image retrieval is the most interesting technique which is being used today in our digital world. CBIR, commonly expanded as Content Based Image Retrieval is an image processing technique which identifies the relevant images and retrieves them based on the patterns that are extracted from the digital images. In this paper, two research works have been presented using CBIR. The first work provides an automated and interactive approach to the analysis of CBIR techniques. CBIR works on the principle of supervised machine learning which involves feature selection followed by training and testing phase applied on a classifier in order to perform prediction. By using feature extraction, the image transforms such as Contourlet, Ridgelet and Shearlet could be utilized to retrieve the texture features from the images. The features extracted are used to train and build a classifier using the classification algorithms such as Naïve Bayes, K-Nearest Neighbour and Multi-class Support Vector Machine. Further the testing phase involves prediction which predicts the new input image using the trained classifier and label them from one of the four classes namely 1- Normal brain, 2- Benign tumour, 3- Malignant tumour and 4- Severe tumour. The second research work includes developing a tool which is used for tumour stage identification using the best feature extraction and classifier identified from the first work. Finally, the tool will be used to predict tumour stage and provide suggestions based on the stage of tumour identified by the system. This paper presents these two approaches which is a contribution to the medical field for giving better retrieval performance and for tumour stages identification.
Abstract: Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.
Abstract: In this paper, we analyze major components of activity recognition (AR) in wearable device with 9-axis sensors and sensor fusion filters. 9-axis sensors commonly include 3-axis accelerometer, 3-axis gyroscope and 3-axis magnetometer. We chose sensor fusion filters as Kalman filter and Direction Cosine Matrix (DCM) filter. We also construct sensor fusion data from each activity sensor data and perform classification by accuracy of AR using Naïve Bayes and SVM. According to the classification results, we observed that the DCM filter and the specific combination of the sensing axes are more effective for AR in wearable devices while classifying walking, running, ascending and descending.
Abstract: Classification is an important data mining technique
and could be used as data filtering in artificial intelligence. The
broad application of classification for all kind of data leads to be
used in nearly every field of our modern life. Classification helps us
to put together different items according to the feature items decided
as interesting and useful. In this paper, we compare two
classification methods Naïve Bayes and ADTree use to detect spam
e-mail. This choice is motivated by the fact that Naive Bayes
algorithm is based on probability calculus while ADTree algorithm is
based on decision tree. The parameter settings of the above
classifiers use the maximization of true positive rate and
minimization of false positive rate. The experiment results present
classification accuracy and cost analysis in view of optimal classifier
choice for Spam Detection. It is point out the number of attributes to
obtain a tradeoff between number of them and the classification
accuracy.
Abstract: We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.
Abstract: Texture is an important characteristic in real and
synthetic scenes. Texture analysis plays a critical role in inspecting
surfaces and provides important techniques in a variety of
applications. Although several descriptors have been presented to
extract texture features, the development of object recognition is still a
difficult task due to the complex aspects of texture. Recently, many
robust and scaling-invariant image features such as SIFT, SURF and
ORB have been successfully used in image retrieval and object
recognition. In this paper, we have tried to compare the performance
for texture classification using these feature descriptors with k-means
clustering. Different classifiers including K-NN, Naive Bayes, Back
Propagation Neural Network , Decision Tree and Kstar were applied in
three texture image sets - UIUCTex, KTH-TIPS and Brodatz,
respectively. Experimental results reveal SIFTS as the best average
accuracy rate holder in UIUCTex, KTH-TIPS and SURF is
advantaged in Brodatz texture set. BP neuro network works best in the
test set classification among all used classifiers.
Abstract: MicroRNAs are small non-coding RNA found in
many different species. They play crucial roles in cancer such as
biological processes of apoptosis and proliferation. The identification
of microRNA-target genes can be an essential first step towards to
reveal the role of microRNA in various cancer types. In this paper,
we predict miRNA-target genes for lung cancer by integrating
prediction scores from miRanda and PITA algorithms used as a
feature vector of miRNA-target interaction. Then, machine-learning
algorithms were implemented for making a final prediction. The
approach developed in this study should be of value for future studies
into understanding the role of miRNAs in molecular mechanisms
enabling lung cancer formation.
Abstract: Phonocardiography is important in appraisal of
congenital heart disease and pulmonary hypertension as it reflects the
duration of right ventricular systoles. The systolic murmur in patients
with intra-cardiac shunt decreases as pulmonary hypertension
develops and may eventually disappear completely as the pulmonary
pressure reaches systemic level. Phonocardiography and auscultation
are non-invasive, low-cost, and accurate methods to assess heart
disease. In this work an objective signal processing tool to extract
information from phonocardiography signal using Wavelet is
proposed to classify the murmur as normal or abnormal. Since the
feature vector is large, a Binary Particle Swarm Optimization (PSO)
with mutation for feature selection is proposed. The extracted
features improve the classification accuracy and were tested across
various classifiers including Naïve Bayes, kNN, C4.5, and SVM.
Abstract: This paper presents an approach for the classification of
an unstructured format description for identification of file formats.
The main contribution of this work is the employment of data mining
techniques to support file format selection with just the unstructured
text description that comprises the most important format features for
a particular organisation. Subsequently, the file format indentification
method employs file format classifier and associated configurations to
support digital preservation experts with an estimation of required file
format. Our goal is to make use of a format specification knowledge
base aggregated from a different Web sources in order to select file
format for a particular institution. Using the naive Bayes method,
the decision support system recommends to an expert, the file format
for his institution. The proposed methods facilitate the selection of
file format and the quality of a digital preservation process. The
presented approach is meant to facilitate decision making for the
preservation of digital content in libraries and archives using domain
expert knowledge and specifications of file formats. To facilitate
decision-making, the aggregated information about the file formats is
presented as a file format vocabulary that comprises most common
terms that are characteristic for all researched formats. The goal is to
suggest a particular file format based on this vocabulary for analysis
by an expert. The sample file format calculation and the calculation
results including probabilities are presented in the evaluation section.
Abstract: Software fault prediction models are created by using
the source code, processed metrics from the same or previous version
of code and related fault data. Some company do not store and keep
track of all artifacts which are required for software fault prediction.
To construct fault prediction model for such company, the training
data from the other projects can be one potential solution. Earlier we
predicted the fault the less cost it requires to correct. The training
data consists of metrics data and related fault data at function/module
level. This paper investigates fault predictions at early stage using the
cross-project data focusing on the design metrics. In this study,
empirical analysis is carried out to validate design metrics for cross
project fault prediction. The machine learning techniques used for
evaluation is Naïve Bayes. The design phase metrics of other projects
can be used as initial guideline for the projects where no previous
fault data is available. We analyze seven datasets from NASA
Metrics Data Program which offer design as well as code metrics.
Overall, the results of cross project is comparable to the within
company data learning.
Abstract: A Distributed Denial of Service (DDoS) attack is a
major threat to cyber security. It originates from the network layer or
the application layer of compromised/attacker systems which are
connected to the network. The impact of this attack ranges from the
simple inconvenience to use a particular service to causing major
failures at the targeted server. When there is heavy traffic flow to a
target server, it is necessary to classify the legitimate access and
attacks. In this paper, a novel method is proposed to detect DDoS
attacks from the traces of traffic flow. An access matrix is created
from the traces. As the access matrix is multi dimensional, Principle
Component Analysis (PCA) is used to reduce the attributes used for
detection. Two classifiers Naive Bayes and K-Nearest neighborhood
are used to classify the traffic as normal or abnormal. The
performance of the classifier with PCA selected attributes and actual
attributes of access matrix is compared by the detection rate and
False Positive Rate (FPR).
Abstract: Microarray technology is universally used in the study
of disease diagnosis using gene expression levels. The main
shortcoming of gene expression data is that it includes thousands of
genes and a small number of samples. Abundant methods and
techniques have been proposed for tumor classification using
microarray gene expression data. Feature or gene selection methods
can be used to mine the genes that directly involve in the
classification and to eliminate irrelevant genes. In this paper
statistical measures like T-Statistics, Signal-to-Noise Ratio (SNR)
and F-Statistics are used to rank the genes. The ranked genes are used
for further classification. Particle Swarm Optimization (PSO)
algorithm and Shuffled Frog Leaping (SFL) algorithm are used to
find the significant genes from the top-m ranked genes. The Naïve
Bayes Classifier (NBC) is used to classify the samples based on the
significant genes. The proposed work is applied on Lung and Ovarian
datasets. The experimental results show that the proposed method
achieves 100% accuracy in all the three datasets and the results are
compared with previous works.
Abstract: ‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.
Abstract: Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.