Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients Cohorts: A Case Study in Scotland

Health and Social care (HSc) services planning and scheduling are facing unprecedented challenges, due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven approaches can help to improve policies, plan and design services provision schedules using algorithms that assist healthcare managers to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as Classification and Regression Trees (CART), Random Forests (RF), and Logistic Regression (LGR). The significance tests Chi-Squared and Student’s test are used on data over a 39 years span for which data exist for services delivered in Scotland. The demands are associated using probabilities and are parts of statistical hypotheses. These hypotheses, as their NULL part, assume that the target demand is statistically dependent on other services’ demands. This linking is checked using the data. In addition, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus, groups of services. Statistical tests confirmed ML coupling and made the prediction statistically meaningful and proved that a target service can be matched reliably to other services while ML showed that such marked relationships can also be linear ones. Zero padding was used for missing years records and illustrated better such relationships both for limited years and for the entire span offering long-term data visualizations while limited years periods explained how well patients numbers can be related in short periods of time or that they can change over time as opposed to behaviours across more years. The prediction performance of the associations were measured using metrics such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC) and Accuracy (ACC) as well as the statistical tests Chi-Squared and Student. Co-plots and comparison tables for the RF, CART, and LGR methods as well as the p-value from tests and Information Exchange (IE/MIE) measures are provided showing the relative performance of ML methods and of the statistical tests as well as the behaviour using different learning ratios. The impact of k-neighbours classification (k-NN), Cross-Correlation (CC) and C-Means (CM) first groupings was also studied over limited years and for the entire span. It was found that CART was generally behind RF and LGR but in some interesting cases, LGR reached an AUC = 0 falling below CART, while the ACC was as high as 0.912 showing that ML methods can be confused by zero-padding or by data’s irregularities or by the outliers. On average, 3 linear predictors were sufficient, LGR was found competing well RF and CART followed with the same performance at higher learning ratios. Services were packed only when a significance level (p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, low birth weights, alcoholism, drug abuse, and emergency admissions. The work found  that different HSc services can be well packed as plans of limited duration, across various services sectors, learning configurations, as confirmed by using statistical hypotheses.

Scenario and Decision Analysis for Solar Energy in Egypt by 2035 Using Dynamic Bayesian Network

Bayesian networks are now considered to be a promising tool in the field of energy with different applications. In this study, the aim was to indicate the states of a previous constructed Bayesian network related to the solar energy in Egypt and the factors affecting its market share, depending on the followed data distribution type for each factor, and using either the Z-distribution approach or the Chebyshev’s inequality theorem. Later on, the separate and the conditional probabilities of the states of each factor in the Bayesian network were derived, either from the collected and scrapped historical data or from estimations and past studies. Results showed that we could use the constructed model for scenario and decision analysis concerning forecasting the total percentage of the market share of the solar energy in Egypt by 2035 and using it as a stable renewable source for generating any type of energy needed. Also, it proved that whenever the use of the solar energy increases, the total costs decreases. Furthermore, we have identified different scenarios, such as the best, worst, 50/50, and most likely one, in terms of the expected changes in the percentage of the solar energy market share. The best scenario showed an 85% probability that the market share of the solar energy in Egypt will exceed 10% of the total energy market, while the worst scenario showed only a 24% probability that the market share of the solar energy in Egypt will exceed 10% of the total energy market. Furthermore, we applied policy analysis to check the effect of changing the controllable (decision) variable’s states acting as different scenarios, to show how it would affect the target nodes in the model. Additionally, the best environmental and economical scenarios were developed to show how other factors are expected to be, in order to affect the model positively. Additional evidence and derived probabilities were added for the weather dynamic nodes whose states depend on time, during the process of converting the Bayesian network into a dynamic Bayesian network.

Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

Levels and Trends of Under-Five Mortality in South Africa from 1998 to 2012

Childhood mortality is a key sign of the coverage of child survival interventions, social and economic progressions. Although the level of under-five mortality has been declining, it is still unacceptably high. The primary aim of this paper is to establish and analyse the levels and trends of under-five mortality for the periods 1998, 2003 and 2012 in South Africa. Methods: The data used for analysis came from the 1998 SADHS, the 2003 SADHS and the 2012 SABSSM which collected information on the survival status of children. The Kaplan-Meier estimate of the survival function method was used to determine the probabilities of failure (death) from birth up to 59 months. Results and Conclusion: The overall U5MR declined by 28.2% from 53.1 in 1998 to 38.1 in 2012. The U5MR of male children declined from 59.2 in 1998 to 46.2 in 2003 and dropped further to 41.4 in 2012. The U5MR of children of mothers aged 40 years and older increased from 64.0 in 1998 to 89.0 in 2003 and rose further to 129.9 in 2012. The U5MR of children of mothers with education level of 12 years or more increased from 32.2 in 1998 to 35.2 in 2003 and declined substantially to 17.5 in 2012.

Modelling Hydrological Time Series Using Wakeby Distribution

The statistical modelling of precipitation data for a given portion of territory is fundamental for the monitoring of climatic conditions and for Hydrogeological Management Plans (HMP). This modelling is rendered particularly complex by the changes taking place in the frequency and intensity of precipitation, presumably to be attributed to the global climate change. This paper applies the Wakeby distribution (with 5 parameters) as a theoretical reference model. The number and the quality of the parameters indicate that this distribution may be the appropriate choice for the interpolations of the hydrological variables and, moreover, the Wakeby is particularly suitable for describing phenomena producing heavy tails. The proposed estimation methods for determining the value of the Wakeby parameters are the same as those used for density functions with heavy tails. The commonly used procedure is the classic method of moments weighed with probabilities (probability weighted moments, PWM) although this has often shown difficulty of convergence, or rather, convergence to a configuration of inappropriate parameters. In this paper, we analyze the problem of the likelihood estimation of a random variable expressed through its quantile function. The method of maximum likelihood, in this case, is more demanding than in the situations of more usual estimation. The reasons for this lie, in the sampling and asymptotic properties of the estimators of maximum likelihood which improve the estimates obtained with indications of their variability and, therefore, their accuracy and reliability. These features are highly appreciated in contexts where poor decisions, attributable to an inefficient or incomplete information base, can cause serious damages.

The Principle Probabilities of Space-Distance Resolution for a Monostatic Radar and Realization in Cylindrical Array

In conjunction with the problem of the target selection on a clutter background, the analysis of the scanning rate influence on the spatial-temporal signal structure, the generalized multivariate correlation function and the quality of the resolution with the increase pulse repetition frequency is made. The possibility of the object space-distance resolution, which is conditioned by the range-to-angle conversion with an increased scanning rate, is substantiated. The calculations for the real cylindrical array at high scanning rate are presented. The high scanning rate let to get the signal to noise improvement of the order of 10 dB for the space-time signal processing.

Relation of Optimal Pilot Offsets in the Shifted Constellation-Based Method for the Detection of Pilot Contamination Attacks

One possible approach for maintaining the security of communication systems relies on Physical Layer Security mechanisms. However, in wireless time division duplex systems, where uplink and downlink channels are reciprocal, the channel estimate procedure is exposed to attacks known as pilot contamination, with the aim of having an enhanced data signal sent to the malicious user. The Shifted 2-N-PSK method involves two random legitimate pilots in the training phase, each of which belongs to a constellation, shifted from the original N-PSK symbols by certain degrees. In this paper, legitimate pilots’ offset values and their influence on the detection capabilities of the Shifted 2-N-PSK method are investigated. As the implementation of the technique depends on the relation between the shift angles rather than their specific values, the optimal interconnection between the two legitimate constellations is investigated. The results show that no regularity exists in the relation between the pilot contamination attacks (PCA) detection probability and the choice of offset values. Therefore, an adversary who aims to obtain the exact offset values can only employ a brute-force attack but the large number of possible combinations for the shifted constellations makes such a type of attack difficult to successfully mount. For this reason, the number of optimal shift value pairs is also studied for both 100% and 98% probabilities of detecting pilot contamination attacks. Although the Shifted 2-N-PSK method has been broadly studied in different signal-to-noise ratio scenarios, in multi-cell systems the interference from the signals in other cells should be also taken into account. Therefore, the inter-cell interference impact on the performance of the method is investigated by means of a large number of simulations. The results show that the detection probability of the Shifted 2-N-PSK decreases inversely to the signal-to-interference-plus-noise ratio.

Quantum Markov Modeling for Healthcare

A Markov model defines a system of states, composed by the feasible transition paths between those states, and the parameters of those transitions. The paths and parameters may be a representative way to address healthcare issues, such as to identify the most likely sequence of patient health states given the sequence of observations. Furthermore estimating the length of stay (LoS) of patients in hospitalization is one of the challenges that Markov models allow us to solve. However, finding the maximum probability of any path that gets to state at time t, can have high computational cost. A quantum approach allows us to take advantage of quantum computation since the calculated probabilities can be in several states, ending up to outperform classical computing due to the possible superposition of states when handling large amounts of data. The aid of quantum physics-based architectures and machine learning techniques are therefore appropriated to address the complexity of healthcare.

Stability Bound of Ruin Probability in a Reduced Two-Dimensional Risk Model

In this work, we introduce the qualitative and quantitative concept of the strong stability method in the risk process modeling two lines of business of the same insurance company or an insurance and re-insurance companies that divide between them both claims and premiums with a certain proportion. The approach proposed is based on the identification of the ruin probability associate to the model considered, with a stationary distribution of a Markov random process called a reversed process. Our objective, after clarifying the condition and the perturbation domain of parameters, is to obtain the stability inequality of the ruin probability which is applied to estimate the approximation error of a model with disturbance parameters by the considered model. In the stability bound obtained, all constants are explicitly written.

An Approaching Index to Evaluate a forward Collision Probability

This paper presents an approaching forward collision probability index (AFCPI) for alerting and assisting driver in keeping safety distance to avoid the forward collision accident in highway driving. The time to collision (TTC) and time headway (TH) are used to evaluate the TTC forward collision probability index (TFCPI) and the TH forward collision probability index (HFCPI), respectively. The Mamdani fuzzy inference algorithm is presented combining TFCPI and HFCPI to calculate the approaching collision probability index of the vehicle. The AFCPI is easier to understand for the driver who did not even have any professional knowledge in vehicle professional field. At the same time, the driver’s behavior is taken into account for suiting each driver. For the approaching index, the value 0 is indicating the 0% probability of forward collision, and the values 0.5 and 1 are indicating the 50% and 100% probabilities of forward collision, respectively. The AFCPI is useful and easy-to-understand for alerting driver to avoid the forward collision accidents when driving in highway.

A Hyperexponential Approximation to Finite-Time and Infinite-Time Ruin Probabilities of Compound Poisson Processes

This article considers the problem of evaluating infinite-time (or finite-time) ruin probability under a given compound Poisson surplus process by approximating the claim size distribution by a finite mixture exponential, say Hyperexponential, distribution. It restates the infinite-time (or finite-time) ruin probability as a solvable ordinary differential equation (or a partial differential equation). Application of our findings has been given through a simulation study.

Web-Based Instructional Program to Improve Professional Development: Recommendations and Standards for Radioactive Facilities in Brazil

This web based project focuses on continuing corporate education and improving workers' skills in Brazilian radioactive facilities throughout the country. The potential of Information and Communication Technologies (ICTs) shall contribute to improve the global communication in this very large country, where it is a strong challenge to ensure high quality professional information to as many people as possible. The main objective of this system is to provide Brazilian radioactive facilities a complete web-based repository - in Portuguese - for research, consultation and information, offering conditions for learning and improving professional and personal skills. UNIPRORAD is a web based system to offer unified programs and inter-related information about radiological protection programs. The content includes the best practices for radioactive facilities in order to meet both national standards and international recommendations published by different organizations over the past decades: International Commission on Radiological Protection (ICRP), International Atomic Energy Agency (IAEA) and National Nuclear Energy Commission (CNEN). The website counts on concepts, definitions and theory about optimization and ionizing radiation monitoring procedures. Moreover, the content presents further discussions related to some national and international recommendations, such as potential exposure, which is currently one of the most important research fields in radiological protection. Only two publications of ICRP develop expressively the issue and there is still a lack of knowledge of fail probabilities, for there are still uncertainties to find effective paths to quantify probabilistically the occurrence of potential exposures and the probabilities to reach a certain level of dose. To respond to this challenge, this project discusses and introduces potential exposures in a more quantitative way than national and international recommendations. Articulating ICRP and AIEA valid recommendations and official reports, in addition to scientific papers published in major international congresses, the website discusses and suggests a number of effective actions towards safety which can be incorporated into labor practice. The WEB platform was created according to corporate public needs, taking into account the development of a robust but flexible system, which can be easily adapted to future demands. ICTs provide a vast array of new communication capabilities and allow to spread information to as many people as possible at low costs and high quality communication. This initiative shall provide opportunities for employees to increase professional skills, stimulating development in this large country where it is an enormous challenge to ensure effective and updated information to geographically distant facilities, minimizing costs and optimizing results.

Estimating Bridge Deterioration for Small Data Sets Using Regression and Markov Models

The primary approach for estimating bridge deterioration uses Markov-chain models and regression analysis. Traditional Markov models have problems in estimating the required transition probabilities when a small sample size is used. Often, reliable bridge data have not been taken over large periods, thus large data sets may not be available. This study presents an important change to the traditional approach by using the Small Data Method to estimate transition probabilities. The results illustrate that the Small Data Method and traditional approach both provide similar estimates; however, the former method provides results that are more conservative. That is, Small Data Method provided slightly lower than expected bridge condition ratings compared with the traditional approach. Considering that bridges are critical infrastructures, the Small Data Method, which uses more information and provides more conservative estimates, may be more appropriate when the available sample size is small. In addition, regression analysis was used to calculate bridge deterioration. Condition ratings were determined for bridge groups, and the best regression model was selected for each group. The results obtained were very similar to those obtained when using Markov chains; however, it is desirable to use more data for better results.

Speaker Identification by Atomic Decomposition of Learned Features Using Computational Auditory Scene Analysis Principals in Noisy Environments

Speaker recognition is performed in high Additive White Gaussian Noise (AWGN) environments using principals of Computational Auditory Scene Analysis (CASA). CASA methods often classify sounds from images in the time-frequency (T-F) plane using spectrograms or cochleargrams as the image. In this paper atomic decomposition implemented by matching pursuit performs a transform from time series speech signals to the T-F plane. The atomic decomposition creates a sparsely populated T-F vector in “weight space” where each populated T-F position contains an amplitude weight. The weight space vector along with the atomic dictionary represents a denoised, compressed version of the original signal. The arraignment or of the atomic indices in the T-F vector are used for classification. Unsupervised feature learning implemented by a sparse autoencoder learns a single dictionary of basis features from a collection of envelope samples from all speakers. The approach is demonstrated using pairs of speakers from the TIMIT data set. Pairs of speakers are selected randomly from a single district. Each speak has 10 sentences. Two are used for training and 8 for testing. Atomic index probabilities are created for each training sentence and also for each test sentence. Classification is performed by finding the lowest Euclidean distance between then probabilities from the training sentences and the test sentences. Training is done at a 30dB Signal-to-Noise Ratio (SNR). Testing is performed at SNR’s of 0 dB, 5 dB, 10 dB and 30dB. The algorithm has a baseline classification accuracy of ~93% averaged over 10 pairs of speakers from the TIMIT data set. The baseline accuracy is attributable to short sequences of training and test data as well as the overall simplicity of the classification algorithm. The accuracy is not affected by AWGN and produces ~93% accuracy at 0dB SNR.

Network-Constrained AC Unit Commitment under Uncertainty Using a Bender’s Decomposition Approach

In this work, the system evaluates the impact of considering a stochastic approach on the day ahead basis Unit Commitment. Comparisons between stochastic and deterministic Unit Commitment solutions are provided. The Unit Commitment model consists in the minimization of the total operation costs considering unit’s technical constraints like ramping rates, minimum up and down time. Load shedding and wind power spilling is acceptable, but at inflated operational costs. The evaluation process consists in the calculation of the optimal unit commitment and in verifying the fulfillment of the considered constraints. For the calculation of the optimal unit commitment, an algorithm based on the Benders Decomposition, namely on the Dual Dynamic Programming, was developed. Two approaches were considered on the construction of stochastic solutions. Data related to wind power outputs from two different operational days are considered on the analysis. Stochastic and deterministic solutions are compared based on the actual measured wind power output at the operational day. Through a technique capability of finding representative wind power scenarios and its probabilities, the system can analyze a more detailed process about the expected final operational cost.

Harnessing Nigeria's Forestry Potential for Structural Applications: Structural Reliability of Nigerian Grown Opepe Timber

This study examined the structural reliability of the Nigerian grown Opepe timber as bridge beam material. The strength of a particular specie of timber depends so much on some factors such as soil and environment in which it is grown. The steps involved are collection of the Opepe timber samples, seasoning/preparation of the test specimens, determination of the strength properties/statistical analysis, development of a computer programme in FORTRAN language and finally structural reliability analysis using FORM 5 software. The result revealed that the Nigerian grown Opepe is a reliable and durable structural bridge beam material for span of 5000mm, depth of 400mm, breadth of 250mm and end bearing length of 150mm. The probabilities of failure in bending parallel to the grain, compression perpendicular to the grain, shear parallel to the grain and deflection are 1.61 x 10-7, 1.43 x 10-8, 1.93 x 10-4 and 1.51 x 10-15 respectively. The paper recommends establishment of Opepe plantation in various Local Government Areas in Nigeria for structural applications such as in bridges, railway sleepers, generation of income to the nation as well as creating employment for the numerous unemployed youths.

Debt Reconstruction, Career Development and Famers Household Well-Being in Thailand

Debts reconstruction under some of moratorium projects is one of important method that highly benefits to both the Banks and farmers. The method can reduce probabilities for nonprofits loan. This paper discuss about debts reconstruction and career development training for farmers in Thailand between 2011 and 2013. The research designed is mix-method between quantitative survey and qualitative survey. Sample size for quantitative method is 1003 cases. Data gathering procedure is between October and December 2013. Main results affirmed that debts reconstruction is needed. And there are numerous benefits from farmers’ career development training. Many of farmers who attend field school activities able to bring knowledge learned to apply for the farms’ work. They can reduce production costs. Framers’ quality of life and their household well-being also improve. This program should apply in any countries where farmers have highly debts and highly risks for not return the debts.

A Generalization of Planar Pascal’s Triangle to Polynomial Expansion and Connection with Sierpinski Patterns

The very well-known stacked sets of numbers referred to as Pascal’s triangle present the coefficients of the binomial expansion of the form (x+y)n. This paper presents an approach (the Staircase Horizontal Vertical, SHV-method) to the generalization of planar Pascal’s triangle for polynomial expansion of the form (x+y+z+w+r+⋯)n. The presented generalization of Pascal’s triangle is different from other generalizations of Pascal’s triangles given in the literature. The coefficients of the generalized Pascal’s triangles, presented in this work, are generated by inspection, using embedded Pascal’s triangles. The coefficients of I-variables expansion are generated by horizontally laying out the Pascal’s elements of (I-1) variables expansion, in a staircase manner, and multiplying them with the relevant columns of vertically laid out classical Pascal’s elements, hence avoiding factorial calculations for generating the coefficients of the polynomial expansion. Furthermore, the classical Pascal’s triangle has some pattern built into it regarding its odd and even numbers. Such pattern is known as the Sierpinski’s triangle. In this study, a presentation of Sierpinski-like patterns of the generalized Pascal’s triangles is given. Applications related to those coefficients of the binomial expansion (Pascal’s triangle), or polynomial expansion (generalized Pascal’s triangles) can be in areas of combinatorics, and probabilities.

An Automatic Bayesian Classification System for File Format Selection

This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format indentification method employs file format classifier and associated configurations to support digital preservation experts with an estimation of required file format. Our goal is to make use of a format specification knowledge base aggregated from a different Web sources in order to select file format for a particular institution. Using the naive Bayes method, the decision support system recommends to an expert, the file format for his institution. The proposed methods facilitate the selection of file format and the quality of a digital preservation process. The presented approach is meant to facilitate decision making for the preservation of digital content in libraries and archives using domain expert knowledge and specifications of file formats. To facilitate decision-making, the aggregated information about the file formats is presented as a file format vocabulary that comprises most common terms that are characteristic for all researched formats. The goal is to suggest a particular file format based on this vocabulary for analysis by an expert. The sample file format calculation and the calculation results including probabilities are presented in the evaluation section.

Capacity Optimization in Cooperative Cognitive Radio Networks

Cooperative spectrum sensing is a crucial challenge in cognitive radio networks. Cooperative sensing can increase the reliability of spectrum hole detection, optimize sensing time and reduce delay in cooperative networks. In this paper, an efficient central capacity optimization algorithm is proposed to minimize cooperative sensing time in a homogenous sensor network using OR decision rule subject to the detection and false alarm probabilities constraints. The evaluation results reveal significant improvement in the sensing time and normalized capacity of the cognitive sensors.