A Large Dataset Imputation Approach Applied to Country Conflict Prediction Data

This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known  values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions, while presenting a need for further refinement that mimics predictive mean matching.

Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients Cohorts: A Case Study in Scotland

Health and Social care (HSc) services planning and scheduling are facing unprecedented challenges, due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven approaches can help to improve policies, plan and design services provision schedules using algorithms that assist healthcare managers to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as Classification and Regression Trees (CART), Random Forests (RF), and Logistic Regression (LGR). The significance tests Chi-Squared and Student’s test are used on data over a 39 years span for which data exist for services delivered in Scotland. The demands are associated using probabilities and are parts of statistical hypotheses. These hypotheses, as their NULL part, assume that the target demand is statistically dependent on other services’ demands. This linking is checked using the data. In addition, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus, groups of services. Statistical tests confirmed ML coupling and made the prediction statistically meaningful and proved that a target service can be matched reliably to other services while ML showed that such marked relationships can also be linear ones. Zero padding was used for missing years records and illustrated better such relationships both for limited years and for the entire span offering long-term data visualizations while limited years periods explained how well patients numbers can be related in short periods of time or that they can change over time as opposed to behaviours across more years. The prediction performance of the associations were measured using metrics such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC) and Accuracy (ACC) as well as the statistical tests Chi-Squared and Student. Co-plots and comparison tables for the RF, CART, and LGR methods as well as the p-value from tests and Information Exchange (IE/MIE) measures are provided showing the relative performance of ML methods and of the statistical tests as well as the behaviour using different learning ratios. The impact of k-neighbours classification (k-NN), Cross-Correlation (CC) and C-Means (CM) first groupings was also studied over limited years and for the entire span. It was found that CART was generally behind RF and LGR but in some interesting cases, LGR reached an AUC = 0 falling below CART, while the ACC was as high as 0.912 showing that ML methods can be confused by zero-padding or by data’s irregularities or by the outliers. On average, 3 linear predictors were sufficient, LGR was found competing well RF and CART followed with the same performance at higher learning ratios. Services were packed only when a significance level (p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, low birth weights, alcoholism, drug abuse, and emergency admissions. The work found  that different HSc services can be well packed as plans of limited duration, across various services sectors, learning configurations, as confirmed by using statistical hypotheses.

Matrix Completion with Heterogeneous Observation Cost Using Sparsity-Number of Column-Space

The matrix completion problem has been studied broadly under many underlying conditions. In many real-life scenarios, we could expect elements from distinct columns or distinct positions to have a different cost. In this paper, we explore this generalization under adaptive conditions. We approach the problem under two different cost models. The first one is that entries from different columns have different observation costs, but, within the same column, each entry has a uniform cost. The second one is any two entry has different observation cost, despite being the same or different columns. We provide complexity analysis of our algorithms and provide tightness guarantees.

The Possibility of Solving a 3x3 Rubik’s Cube under 3 Seconds

Rubik's cube was invented in 1974. Since then, speedcubers all over the world try their best to break the world record again and again. The newest record is 3.47 seconds. There are many factors that affect the timing including turns per second (tps), algorithm, finger trick, and hardware of the cube. In this paper, the lower bound of the cube solving time will be discussed using convex optimization. Extended analysis of the world records will be used to understand how to improve the timing. With the understanding of each part of the solving step, the paper suggests a list of speed improvement technique. Based on the analysis of the world record, there is a high possibility that the 3 seconds mark will be broken soon.

Optimizing Data Evaluation Metrics for Fraud Detection Using Machine Learning

The use of technology has benefited society in more ways than one ever thought possible. Unfortunately, as society’s knowledge of technology has advanced, so has its knowledge of ways to use technology to manipulate others. This has led to a simultaneous advancement in the world of fraud. Machine learning techniques can offer a possible solution to help decrease these advancements. This research explores how the use of various machine learning techniques can aid in detecting fraudulent activity across two different types of fraudulent datasets, and the accuracy, precision, recall, and F1 were recorded for each method. Each machine learning model was also tested across five different training and testing splits in order to discover which split and technique would lead to the most optimal results.

The Analysis of Different Classes of Weighted Fuzzy Petri Nets and Their Features

This paper presents the analysis of six different classes of Petri nets: fuzzy Petri nets (FPN), generalized fuzzy Petri nets (GFPN), parameterized fuzzy Petri nets (PFPN), T2GFPN, flexible generalized fuzzy Petri nets (FGFPN), binary Petri nets (BPN). These classes were simulated in the special software PNeS® for the analysis of its pros and cons on the example of models which are dedicated to the decision-making process of passenger transport logistics. The paper includes the analysis of two approaches: when input values are filled with the experts’ knowledge; when fuzzy expectations represented by output values are added to the point. These approaches fulfill the possibilities of triples of functions which are replaced with different combinations of t-/s-norms.

Solution of Two-Point Nonlinear Boundary Problems Using Taylor Series Approximation and the Ying Buzu Shu Algorithm

One of the major challenges faced in solving initial and boundary problems is how to find approximate solutions with minimal deviation from the exact solution without so much rigor and complications. The Taylor series method provides a simple way of obtaining an infinite series which converges to the exact solution for initial value problems and this method of solution is somewhat limited for a two point boundary problem since the infinite series has to be truncated to include the boundary conditions. In this paper, the Ying Buzu Shu algorithm is used to solve a two point boundary nonlinear diffusion problem for the fourth and sixth order solution and compare their relative error and rate of convergence to the exact solution.