Injury Prediction for Soccer Players Using Machine Learning

Injuries in professional sports occur on a regular basis. Some may be minor while others can cause huge impact on a player’s career and earning potential. In soccer, there is a high risk of players picking up injuries during game time. This research work seeks to help soccer players reduce the risk of getting injured by predicting the likelihood of injury while playing in the near future and then providing recommendations for intervention. The injury prediction tool will use a soccer player’s number of minutes played on the field, number of appearances, distance covered and performance data for the current and previous seasons as variables to conduct statistical analysis and provide injury predictive results using a machine learning linear regression model.

The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

An Internet of Things-Based Weight Monitoring System for Honey

Bees play a vital role in pollination. This paper focuses on the weighing process of honey. Honey is usually stored at the comb in a hive. Bee farmers brush bees away from the comb and then collect honey, and the collected honey is weighed afterward. However, such a process brings strong negative influences on bees and even leads to the death of bees. This paper therefore presents an Internet of Things-based weight monitoring system which uses weight sensors to measure the weight of honey and simplifies the whole weighing procedure. To verify the system, the weight measured by the system is compared to the weight of standard weights used for calibration by employing a linear regression model. The R2 of the regression model is 0.9788, which suggests that the weighing system is highly reliable and is able to be applied to obtain actual weight of honey. In the future, the weight data of honey can be used to find the relationship between honey production and different ecological parameters, such as bees’ foraging behavior and weather conditions. It is expected that the findings can serve as critical information for honey production improvement.

Evaluating Factors Influencing Information Quality in Large Firms

Information quality is a major performance measure for an Enterprise Resource Planning (ERP) system of any firm. This study identifies various critical success factors of information quality. The effect of various critical success factors like project management, reengineering efforts and interdepartmental communications on information quality is analyzed using a multiple regression model. Here quantitative data are collected from respondents from various firms through structured questionnaire for assessment of the information quality, project management, reengineering efforts and interdepartmental communications. The validity and reliability of the data are ensured using techniques like factor analysis, computing of Cronbach’s alpha. This study gives relative importance of each of the critical success factors. The findings suggest that among the various factors influencing information quality careful reengineering efforts are the most influencing factor. This paper gives clear insight to managers and practitioners regarding the relative importance of critical success factors influencing information quality so that they can formulate a strategy at the beginning of ERP system implementation.

Consumption Insurance against the Chronic Illness: Evidence from Thailand

This paper studies consumption insurance against the chronic illness in Thailand. The study estimates the impact of household consumption in the chronic illness on consumption growth. Chronic illness is the health care costs of a person or a household’s decision in treatment for the long term; the causes and effects of the household’s ability for smooth consumption. The chronic illnesses are measured in health status when at least one member within the household faces the chronic illness. The data used is from the Household Social Economic Panel Survey conducted during 2007 and 2012. The survey collected data from approximately 6,000 households from every province, both inside and outside municipal areas in Thailand. The study estimates the change in household consumption by using an ordinary least squares (OLS) regression model. The result shows that the members within the household facing the chronic illness would reduce the consumption by around 4%. This case indicates that consumption insurance in Thailand is quite sufficient against chronic illness.

The Effect of User Comments on Traffic Application Usage

With the unprecedented rates of technological improvements, people start to solve their problems with the help of technological tools. According to application stores and websites in which people evaluate and comment on the traffic apps, there are more than 100 traffic applications which have different features with respect to their purpose of usage ranging from the features of traffic apps for public transit modes to the features of traffic apps for private cars. This study focuses on the top 30 traffic applications which were chosen with respect to their download counts. All data about the traffic applications were obtained from related websites. The purpose of this study is to analyze traffic applications in terms of their categorical attributes with the help of developing a regression model. The analysis results suggest that negative interpretations (e.g., being deficient) does not lead to lower star ratings of the applications. However, those negative interpretations result in a smaller increase in star rate. In addition, women use higher star rates than men for the evaluation of traffic applications.

Analysis of Attention to the Confucius Institute from Domestic and Foreign Mainstream Media

The rapid development of the Confucius Institute is attracting more and more attention from mainstream media around the world. Mainstream media plays a large role in public information dissemination and public opinion. This study presents efforts to analyze the correlation and functional relationship between domestic and foreign mainstream media by analyzing the amount of reports on the Confucius Institute. Three kinds of correlation calculation methods, the Pearson correlation coefficient (PCC), the Spearman correlation coefficient (SCC), and the Kendall rank correlation coefficient (KCC), were applied to analyze the correlations among mainstream media from three regions: mainland of China; Hong Kong and Macao (the two special administration regions of China denoted as SARs); and overseas countries excluding China, such as the United States, England, and Canada. Further, the paper measures the functional relationships among the regions using a regression model. The experimental analyses found high correlations among mainstream media from the different regions. Additionally, we found that there is a linear relationship between the mainstream media of overseas countries and those of the SARs by analyzing the amount of reports on the Confucius Institute based on a data set obtained by crawling the websites of 106 mainstream media during the years 2004 to 2014.

A Statistical Model for the Geotechnical Parameters of Cement-Stabilised Hightown’s Soft Soil: A Case Stufy of Liverpool, UK

This study investigates the effect of two important parameters (length of curing period and percentage of the added binder) on the strength of soil treated with OPC. An intermediate plasticity silty clayey soil with medium organic content was used in this study. This soft soil was treated with different percentages of a commercially available cement type 32.5-N. laboratory experiments were carried out on the soil treated with 0, 1.5, 3, 6, 9, and 12% OPC by the dry weight to determine the effect of OPC on the compaction parameters, consistency limits, and the compressive strength. Unconfined compressive strength (UCS) test was carried out on cement-treated specimens after exposing them to different curing periods (1, 3, 7, 14, 28, and 90 days). The results of UCS test were used to develop a non-linear multi-regression model to find the relationship between the predicted and the measured maximum compressive strength of the treated soil (qu). The results indicated that there was a significant improvement in the index of plasticity (IP) by treating with OPC; IP was decreased from 20.2 to 14.1 by using 12% of OPC; this percentage was enough to increase the UCS of the treated soil up to 1362 kPa after 90 days of curing. With respect to the statistical model of the predicted qu, the results showed that the regression coefficients (R2) was equal to 0.8534 which indicates a good reproducibility for the constructed model.

Segmentation of Piecewise Polynomial Regression Model by Using Reversible Jump MCMC Algorithm

Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.

The Relationship of Private Savings and Economic Growth: Case of Croatia

The main objective of the research in this paper is to empirically assess the causal relationship of private savings and economic growth in the Republic of Croatia. Households’ savings are approximated by household deposits in banks, while domestic income is approximated by industrial production volume indices. Vector Autoregression model and Granger causality tests are used to in order to analyse the relationship among private savings and economic growth. Since ADF unit root tests have shown that both mentioned series are non stationary at levels, series are first differenced in order to become stationary. Therefore, VAR model is estimated with percentage change in private savings and percentage change in domestic income, which can be interpreted as economic growth in case of positive percentage change in domestic income. The Granger causality test has shown that there is no causal relationship among private savings and economic growth in Croatia. The impulse response functions have shown that the impact of shock in domestic income on private savings change is stronger than the impact of private saving on growth. Variance decompositions show that both economic growth and private saving change explain the largest part of its own forecast variance. The research has shown that the link between private savings economic and growth in Croatia is weak, what is in line with relevant empirical research in small open economies.

Development of Regression Equation for Surface Finish and Analysis of Surface Integrity in EDM

Electrical discharge machining (EDM) is a relatively modern machining process having distinct advantages over other machining processes and can machine Ti-alloys effectively. The present study emphasizes the features of the development of regression equation based on response surface methodology (RSM) for correlating the interactive and higher-order influences of machining parameters on surface finish of Titanium alloy Ti-6Al-4V. The process parameters selected in this study are discharge current, pulse on time, pulse off time and servo voltage. Machining has been accomplished using negative polarity of Graphite electrode. Analysis of variance is employed to ascertain the adequacy of the developed regression model. Experiments based on central composite of response surface method are carried out. Scanning electron microscopy (SEM) analysis was performed to investigate the surface topography of the EDMed job. The results evidence that the proposed regression equation can predict the surface roughness effectively. The lower ampere and short pulse on time yield better surface finish.

Categorical Data Modeling: Logistic Regression Software

A Matlab based software for logistic regression is developed to enhance the process of teaching quantitative topics and assist researchers with analyzing wide area of applications where categorical data is involved. The software offers an option of performing stepwise logistic regression to select the most significant predictors. The software includes a feature to detect influential observations in data, and investigates the effect of dropping or misclassifying an observation on a predictor variable. The input data may consist either as a set of individual responses (yes/no) with the predictor variables or as grouped records summarizing various categories for each unique set of predictor variables' values. Graphical displays are used to output various statistical results and to assess the goodness of fit of the logistic regression model. The software recognizes possible convergence constraints when present in data, and the user is notified accordingly.

Relationship between Sums of Squares in Linear Regression and Semi-parametric Regression

In this paper, the sum of squares in linear regression is reduced to sum of squares in semi-parametric regression. We indicated that different sums of squares in the linear regression are similar to various deviance statements in semi-parametric regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the semi-parametric regression model. Then, it is made an application in order to support the theory of the linear regression and semi-parametric regression. In this way, study is supported with a simulated data example.

Density Estimation using Generalized Linear Model and a Linear Combination of Gaussians

In this paper we present a novel approach for density estimation. The proposed approach is based on using the logistic regression model to get initial density estimation for the given empirical density. The empirical data does not exactly follow the logistic regression model, so, there will be a deviation between the empirical density and the density estimated using logistic regression model. This deviation may be positive and/or negative. In this paper we use a linear combination of Gaussian (LCG) with positive and negative components as a model for this deviation. Also, we will use the expectation maximization (EM) algorithm to estimate the parameters of LCG. Experiments on real images demonstrate the accuracy of our approach.

A Comparison of the Sum of Squares in Linear and Partial Linear Regression Models

In this paper, estimation of the linear regression model is made by ordinary least squares method and the partially linear regression model is estimated by penalized least squares method using smoothing spline. Then, it is investigated that differences and similarity in the sum of squares related for linear regression and partial linear regression models (semi-parametric regression models). It is denoted that the sum of squares in linear regression is reduced to sum of squares in partial linear regression models. Furthermore, we indicated that various sums of squares in the linear regression are similar to different deviance statements in partial linear regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the partial linear regression model. For this aim, it is made two different applications. A simulated and a real data set are considered to prove the claim mentioned here. In this way, this study is supported with a simulation and a real data example.

A Comparison of Marginal and Joint Generalized Quasi-likelihood Estimating Equations Based On the Com-Poisson GLM: Application to Car Breakdowns Data

In this paper, we apply and compare two generalized estimating equation approaches to the analysis of car breakdowns data in Mauritius. Number of breakdowns experienced by a machinery is a highly under-dispersed count random variable and its value can be attributed to the factors related to the mechanical input and output of that machinery. Analyzing such under-dispersed count observation as a function of the explanatory factors has been a challenging problem. In this paper, we aim at estimating the effects of various factors on the number of breakdowns experienced by a passenger car based on a study performed in Mauritius over a year. We remark that the number of passenger car breakdowns is highly under-dispersed. These data are therefore modelled and analyzed using Com-Poisson regression model. We use the two types of quasi-likelihood estimation approaches to estimate the parameters of the model: marginal and joint generalized quasi-likelihood estimating equation approaches. Under-dispersion parameter is estimated to be around 2.14 justifying the appropriateness of Com-Poisson distribution in modelling underdispersed count responses recorded in this study.

Modelling Dengue Fever (DF) and Dengue Haemorrhagic Fever (DHF) Outbreak Using Poisson and Negative Binomial Model

Dengue fever has become a major concern for health authorities all over the world particularly in the tropical countries. These countries, in particular are experiencing the most worrying outbreak of dengue fever (DF) and dengue haemorrhagic fever (DHF). The DF and DHF epidemics, thus, have become the main causes of hospital admissions and deaths in Malaysia. This paper, therefore, attempts to examine the environmental factors that may influence the recent dengue outbreak. The aim of this study is twofold, firstly is to establish a statistical model to describe the relationship between the number of dengue cases and a range of explanatory variables and secondly, to identify the lag operator for explanatory variables which affect the dengue incidence the most. The explanatory variables involved include the level of cloud cover, percentage of relative humidity, amount of rainfall, maximum temperature, minimum temperature and wind speed. The Poisson and Negative Binomial regression analyses were used in this study. The results of the analyses on the 915 observations (daily data taken from July 2006 to Dec 2008), reveal that the climatic factors comprising of daily temperature and wind speed were found to significantly influence the incidence of dengue fever after 2 and 3 weeks of their occurrences. The effect of humidity, on the other hand, appears to be significant only after 2 weeks.

Industrial Effects and Firm's Survival (Case Study: Iran- East Azarbaijan Province)

The aim of this paper is to investigate the effect of mean size of industry on survival of new firms in East-Azarbaijan province through 1981-2006 using hazard function. So the effect of two variables including mean employment of industry and mean capital of industry are investigated on firm's survival. The Industry & Mine Ministry database has used for data gathering and the data are analyzed using the semi-parametric cox regression model. The results of this study shows that there is a meaningful negative relationship between mean capital of industry and firm's survival, but the mean employment of industry has no meaningful effect on survival of new firms.

Developing Pedotransfer Functions for Estimating Some Soil Properties using Artificial Neural Network and Multivariate Regression Approaches

Study of soil properties like field capacity (F.C.) and permanent wilting point (P.W.P.) play important roles in study of soil moisture retention curve. Although these parameters can be measured directly, their measurement is difficult and expensive. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. In this investigation, 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. The data set was divided into two subsets for calibration (80%) and testing (20%) of the models and their normality were tested by Kolmogorov-Smirnov method. Both multivariate regression and artificial neural network (ANN) techniques were employed to develop the appropriate PTFs for predicting soil parameters using easily measurable characteristics of clay, silt, O.C, S.P, B.D and CaCO3. The performance of the multivariate regression and ANN models was evaluated using an independent test data set. In order to evaluate the models, root mean square error (RMSE) and R2 were used. The comparison of RSME for two mentioned models showed that the ANN model gives better estimates of F.C and P.W.P than the multivariate regression model. The value of RMSE and R2 derived by ANN model for F.C and P.W.P were (2.35, 0.77) and (2.83, 0.72), respectively. The corresponding values for multivariate regression model were (4.46, 0.68) and (5.21, 0.64), respectively. Results showed that ANN with five neurons in hidden layer had better performance in predicting soil properties than multivariate regression.

Analyzing the Factors Effecting the Passenger Car Breakdowns using Com-Poisson GLM

Number of breakdowns experienced by a machinery is a highly under-dispersed count random variable and its value can be attributed to the factors related to the mechanical input and output of that machinery. Analyzing such under-dispersed count observations as a function of the explanatory factors has been a challenging problem. In this paper, we aim at estimating the effects of various factors on the number of breakdowns experienced by a passenger car based on a study performed in Mauritius over a year. We remark that the number of passenger car breakdowns is highly under-dispersed. These data are therefore modelled and analyzed using Com-Poisson regression model. We use quasi-likelihood estimation approach to estimate the parameters of the model. Under-dispersion parameter is estimated to be 2.14 justifying the appropriateness of Com-Poisson distribution in modelling under-dispersed count responses recorded in this study.