Abstract: Improving resources for medical autonomy of astronauts in prolonged space missions, such as a Mars mission, requires not only technology development, but also decision-making support systems. The Advanced Crew Medical System - Medical Condition Requirements study, funded by the Canadian Space Agency, aimed to create knowledge content and a scenario-based query capability to support medical autonomy of astronauts. The key objective of this study was to create a prototype tool for identifying medical infrastructure requirements in terms of medical knowledge, skills and materials. A multicriteria decision-making method was used to prioritize the highest risk medical events anticipated in a long-term space mission. Starting with those medical conditions, event sequence diagrams (ESDs) were created in the form of decision trees where the entry point is the diagnosis and the end points are the predicted outcomes (full recovery, partial recovery, or death/severe incapacitation). The ESD formalism was adapted to characterize and compare possible outcomes of medical conditions as a function of available medical knowledge, skills, and supplies in a given mission scenario. An extensive literature review was performed and summarized in a medical condition database. A PostgreSQL relational database was created to allow query-based evaluation of health outcome metrics with different medical infrastructure scenarios. Critical decision points, skill and medical supply requirements, and probable health outcomes were compared across chosen scenarios. The three medical conditions with the highest risk rank were acute coronary syndrome, sepsis, and stroke. Our efforts demonstrate the utility of this approach and provide insight into the effort required to develop appropriate content for the range of medical conditions that may arise.
Abstract: Telemedicine services use a large amount of data, most of which are diagnostic images in Digital Imaging and Communications in Medicine (DICOM) and Health Level Seven (HL7) formats. Metadata is generated from each related image to support their identification. This study presents the use of decision trees for the optimization of information search processes for diagnostic images, hosted on the cloud server. To analyze the performance in the server, the following quality of service (QoS) metrics are evaluated: delay, bandwidth, jitter, latency and throughput in five test scenarios for a total of 26 experiments during the loading and downloading of DICOM images, hosted by the telemedicine group server of the Universidad Militar Nueva Granada, Bogotá, Colombia. By applying decision trees as a data mining technique and comparing it with the sequential search, it was possible to evaluate the search times of diagnostic images in the server. The results show that by using the metadata in decision trees, the search times are substantially improved, the computational resources are optimized and the request management of the telemedicine image service is improved. Based on the experiments carried out, search efficiency increased by 45% in relation to the sequential search, given that, when downloading a diagnostic image, false positives are avoided in management and acquisition processes of said information. It is concluded that, for the diagnostic images services in telemedicine, the technique of decision trees guarantees the accessibility and robustness in the acquisition and manipulation of medical images, in improvement of the diagnoses and medical procedures in patients.
Abstract: With the widespread adoption of the Internet-connected
devices, and with the prevalence of the Internet of Things (IoT)
applications, there is an increased interest in machine learning
techniques that can provide useful and interesting services in the
smart home domain. The areas that machine learning techniques
can help advance are varied and ever-evolving. Classifying smart
home inhabitants’ Activities of Daily Living (ADLs), is one
prominent example. The ability of machine learning technique to find
meaningful spatio-temporal relations of high-dimensional data is an
important requirement as well. This paper presents a comparative
evaluation of state-of-the-art machine learning techniques to classify
ADLs in the smart home domain. Forty-two synthetic datasets and
two real-world datasets with multiple inhabitants are used to evaluate
and compare the performance of the identified machine learning
techniques. Our results show significant performance differences
between the evaluated techniques. Such as AdaBoost, Cortical
Learning Algorithm (CLA), Decision Trees, Hidden Markov Model
(HMM), Multi-layer Perceptron (MLP), Structured Perceptron and
Support Vector Machines (SVM). Overall, neural network based
techniques have shown superiority over the other tested techniques.
Abstract: This paper addresses the issues and technique for textile industry using data mining techniques. Data mining has been applied to the stitching of garments products that were obtained from a textile company. Data mining techniques were applied to the data obtained from the CHAID algorithm, CART algorithm, Regression Analysis and, Artificial Neural Networks. Classification technique based analyses were used while data mining and decision model about the production per person and variables affecting about production were found by this method. In the study, the results show that as the daily working time increases, the production per person also decreases. In addition, the relationship between total daily working and production per person shows a negative result and the production per person show the highest and negative relationship.
Abstract: The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.
Abstract: Cancer affects people globally with breast cancer being a leading killer. Breast cancer is due to the uncontrollable multiplication of cells resulting in a tumour or neoplasm. Tumours are called ‘benign’ when cancerous cells do not ravage other body tissues and ‘malignant’ if they do so. As mammography is an effective breast cancer detection tool at an early stage which is the most treatable stage it is the primary imaging modality for screening and diagnosis of this cancer type. This paper presents an automatic mammogram classification technique using wavelet and Gabor filter. Correlation feature selection is used to reduce the feature set and selected features are classified using different decision trees.
Abstract: In the past few years, the amount of malicious software
increased exponentially and, therefore, machine learning algorithms
became instrumental in identifying clean and malware files through
(semi)-automated classification. When working with very large
datasets, the major challenge is to reach both a very high malware
detection rate and a very low false positive rate. Another challenge
is to minimize the time needed for the machine learning algorithm to
do so. This paper presents a comparative study between different
machine learning techniques such as linear classifiers, ensembles,
decision trees or various hybrids thereof. The training dataset consists
of approximately 2 million clean files and 200.000 infected files,
which is a realistic quantitative mixture. The paper investigates the
above mentioned methods with respect to both their performance
(detection rate and false positive rate) and their practicability.
Abstract: In this paper, we used data mining to extract
biomedical knowledge. In general, complex biomedical data
collected in studies of populations are treated by statistical methods,
although they are robust, they are not sufficient in themselves to
harness the potential wealth of data. For that you used in step two
learning algorithms: the Decision Trees and Support Vector Machine
(SVM). These supervised classification methods are used to make the
diagnosis of thyroid disease. In this context, we propose to promote
the study and use of symbolic data mining techniques.
Abstract: Existing methods of data mining cannot be applied on
spatial data because they require spatial specificity consideration, as
spatial relationships.
This paper focuses on the classification with decision trees, which
are one of the data mining techniques. We propose an extension of
the C4.5 algorithm for spatial data, based on two different approaches
Join materialization and Querying on the fly the different tables.
Similar works have been done on these two main approaches, the
first - Join materialization - favors the processing time in spite of
memory space, whereas the second - Querying on the fly different
tables- promotes memory space despite of the processing time.
The modified C4.5 algorithm requires three entries tables: a target
table, a neighbor table, and a spatial index join that contains the
possible spatial relationship among the objects in the target table and
those in the neighbor table. Thus, the proposed algorithms are applied
to a spatial data pattern in the accidentology domain.
A comparative study of our approach with other works of
classification by spatial decision trees will be detailed.
Abstract: ‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.
Abstract: A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.
Abstract: This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.
Abstract: Data mining has been used very frequently to extract
hidden information from large databases. This paper suggests the use
of decision trees for continuously extracting the clinical reasoning in
the form of medical expert-s actions that is inherent in large number
of EMRs (Electronic Medical records). In this way the extracted data
could be used to teach students of oral medicine a number of orderly
processes for dealing with patients who represent with different
problems within the practice context over time.
Abstract: Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
Abstract: In this paper a new method is suggested for risk
management by the numerical patterns in data-mining. These patterns
are designed using probability rules in decision trees and are cared to
be valid, novel, useful and understandable. Considering a set of
functions, the system reaches to a good pattern or better objectives.
The patterns are analyzed through the produced matrices and some
results are pointed out. By using the suggested method the direction
of the functionality route in the systems can be controlled and best
planning for special objectives be done.
Abstract: Globalization, supported by information and
communication technologies, changes the rules of competitiveness
and increases the significance of information, knowledge and
network cooperation. In line with this trend, the need for efficient
trust-building tools has emerged. The absence of trust building
mechanisms and strategies was identified within several studies.
Through trust development, participation on e-business network and
usage of network services will increase and provide to SMEs new
economic benefits. This work is focused on effective trust building
strategies development for electronic business network platforms.
Based on trust building mechanism identification, the questionnairebased
analysis of its significance and minimum level of requirements
was conducted. In the paper, we are confirming the trust dependency
on e-Skills which play crucial role in higher level of trust into the
more sophisticated and complex trust building ICT solutions.
Abstract: Competing risks survival data that comprises of more
than one type of event has been used in many applications, and one
of these is in clinical study (e.g. in breast cancer study). The
decision tree method can be extended to competing risks survival
data by modifying the split function so as to accommodate two or
more risks which might be dependent on each other. Recently,
researchers have constructed some decision trees for recurrent
survival time data using frailty and marginal modelling. We further
extended the method for the case of competing risks. In this paper,
we developed the decision tree method for competing risks survival
time data based on proportional hazards for subdistribution of
competing risks. In particular, we grow a tree by using deviance
statistic. The application of breast cancer data is presented. Finally,
to investigate the performance of the proposed method, simulation
studies on identification of true group of observations were executed.
Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%
Abstract: Scene interpretation systems need to match (often ambiguous)
low-level input data to concepts from a high-level ontology.
In many domains, these decisions are uncertain and benefit greatly
from proper context. This paper demonstrates the use of decision
trees for estimating class probabilities for regions described by feature
vectors, and shows how context can be introduced in order to improve
the matching performance.
Abstract: Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. An adaptation of the algorithm is proposed in which a decision tree from root node until a final leave is used for initialization of multilayer neural network. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.