Abstract: The goal of this research is discovering the
determinants of the success or failure of external cooperation in small
and medium enterprises (SMEs). For this, a survey was given to 190
SMEs that experienced external cooperation within the last 3 years. A
logistic regression model was used to derive organizational or strategic
characteristics that significantly influence whether external
collaboration of domestic SMEs is successful or not. Results suggest
that research and development (R&D) features in general
characteristics (both idea creation and discovering market
opportunities) that focused on and emphasized indirected-market
stakeholders (such as complementary companies and affiliates) and
strategies in innovative strategic characteristics raise the probability of
successful external cooperation. This can be used meaningfully to
build a policy or strategy for inducing successful external cooperation
or to understand the innovation of SMEs.
Abstract: Data mining incorporates a group of statistical
methods used to analyze a set of information, or a data set. It operates
with models and algorithms, which are powerful tools with the great
potential. They can help people to understand the patterns in certain
chunk of information so it is obvious that the data mining tools have
a wide area of applications. For example in the theoretical chemistry
data mining tools can be used to predict moleculeproperties or
improve computer-assisted drug design. Classification analysis is one
of the major data mining methodologies. The aim of thecontribution
is to create a classification model, which would be able to deal with a
huge data set with high accuracy. For this purpose logistic regression,
Bayesian logistic regression and random forest models were built
using R software. TheBayesian logistic regression in Latent GOLD
software was created as well. These classification methods belong to
supervised learning methods.
It was necessary to reduce data matrix dimension before construct
models and thus the factor analysis (FA) was used. Those models
were applied to predict the biological activity of molecules, potential
new drug candidates.
Abstract: The aim of the study was to identify seat belt wearing
factor among road users in Malaysia. Evidence-based approach
through in-depth crash investigation was utilised to determine the
intended objectives. The objective was scoped into crashes
investigated by Malaysian Institute of Road Safety Research
(MIROS) involving passenger vehicles within 2007 and 2010. Crash
information of a total of 99 crash cases involving 240 vehicles and
864 occupants were obtained during the study period. Statistical test
and logistic regression analysis have been performed. Results of the
analysis revealed that gender, seat position and age were associated
with seat belt wearing compliance in Malaysia. Males are 97.6%
more likely to wear seat belt compared to females (95% CI 1.317 to
2.964). By seat position, the finding indicates that frontal occupants
were 82 times more likely to be wearing seat belt (95% CI 30.199 to
225.342) as compared to rear occupants. It is also important to note
that the odds of seat belt wearing increased by about 2.64% (95% CI
1.0176 to 1.0353) for every one year increase in age. This study is
essential in understanding the Malaysian tendency in belting up
while being occupied in a vehicle. The factors highlighted in this
study should be emphasized in road safety education in order to
increase seat belt wearing rate in this country and ultimately in
preventing deaths due to road crashes.
Abstract: To evaluate the ability to predict xerostomia after
radiotherapy, we constructed and compared neural network and
logistic regression models. In this study, 61 patients who completed a
questionnaire about their quality of life (QoL) before and after a full
course of radiation therapy were included. Based on this questionnaire,
some statistical data about the condition of the patients’ salivary
glands were obtained, and these subjects were included as the inputs of
the neural network and logistic regression models in order to predict
the probability of xerostomia. Seven variables were then selected from
the statistical data according to Cramer’s V and point-biserial
correlation values and were trained by each model to obtain the
respective outputs which were 0.88 and 0.89 for AUC, 9.20 and 7.65
for SSE, and 13.7% and 19.0% for MAPE, respectively. These
parameters demonstrate that both neural network and logistic
regression methods are effective for predicting conditions of parotid
glands.
Abstract: It is well known that Logistic Regression is the gold
standard method for predicting clinical outcome, especially
predicting risk of mortality. In this paper, the Decision Tree method
has been proposed to solve specific problems that commonly use
Logistic Regression as a solution. The Biochemistry and
Haematology Outcome Model (BHOM) dataset obtained from
Portsmouth NHS Hospital from 1 January to 31 December 2001 was
divided into four subsets. One subset of training data was used to
generate a model, and the model obtained was then applied to three
testing datasets. The performance of each model from both methods
was then compared using calibration (the χ2 test or chi-test) and
discrimination (area under ROC curve or c-index). The experiment
presented that both methods have reasonable results in the case of the
c-index. However, in some cases the calibration value (χ2) obtained
quite a high result. After conducting experiments and investigating
the advantages and disadvantages of each method, we can conclude
that Decision Trees can be seen as a worthy alternative to Logistic
Regression in the area of Data Mining.
Abstract: Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.