Exploring the Determinants for Successful Collaboration of SMEs

The goal of this research is discovering the determinants of the success or failure of external cooperation in small and medium enterprises (SMEs). For this, a survey was given to 190 SMEs that experienced external cooperation within the last 3 years. A logistic regression model was used to derive organizational or strategic characteristics that significantly influence whether external collaboration of domestic SMEs is successful or not. Results suggest that research and development (R&D) features in general characteristics (both idea creation and discovering market opportunities) that focused on and emphasized indirected-market stakeholders (such as complementary companies and affiliates) and strategies in innovative strategic characteristics raise the probability of successful external cooperation. This can be used meaningfully to build a policy or strategy for inducing successful external cooperation or to understand the innovation of SMEs.

Data Mining Classification Methods Applied in Drug Design

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

Identification of Seat Belt Wearing Compliance Associate Factors in Malaysia: Evidence-based Approach

The aim of the study was to identify seat belt wearing factor among road users in Malaysia. Evidence-based approach through in-depth crash investigation was utilised to determine the intended objectives. The objective was scoped into crashes investigated by Malaysian Institute of Road Safety Research (MIROS) involving passenger vehicles within 2007 and 2010. Crash information of a total of 99 crash cases involving 240 vehicles and 864 occupants were obtained during the study period. Statistical test and logistic regression analysis have been performed. Results of the analysis revealed that gender, seat position and age were associated with seat belt wearing compliance in Malaysia. Males are 97.6% more likely to wear seat belt compared to females (95% CI 1.317 to 2.964). By seat position, the finding indicates that frontal occupants were 82 times more likely to be wearing seat belt (95% CI 30.199 to 225.342) as compared to rear occupants. It is also important to note that the odds of seat belt wearing increased by about 2.64% (95% CI 1.0176 to 1.0353) for every one year increase in age. This study is essential in understanding the Malaysian tendency in belting up while being occupied in a vehicle. The factors highlighted in this study should be emphasized in road safety education in order to increase seat belt wearing rate in this country and ultimately in preventing deaths due to road crashes.

Comparison of Neural Network and Logistic Regression Methods to Predict Xerostomia after Radiotherapy

To evaluate the ability to predict xerostomia after radiotherapy, we constructed and compared neural network and logistic regression models. In this study, 61 patients who completed a questionnaire about their quality of life (QoL) before and after a full course of radiation therapy were included. Based on this questionnaire, some statistical data about the condition of the patients’ salivary glands were obtained, and these subjects were included as the inputs of the neural network and logistic regression models in order to predict the probability of xerostomia. Seven variables were then selected from the statistical data according to Cramer’s V and point-biserial correlation values and were trained by each model to obtain the respective outputs which were 0.88 and 0.89 for AUC, 9.20 and 7.65 for SSE, and 13.7% and 19.0% for MAPE, respectively. These parameters demonstrate that both neural network and logistic regression methods are effective for predicting conditions of parotid glands.

Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.