The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Adaptive Nonparametric Approach for Guaranteed Real-Time Detection of Targeted Signals in Multichannel Monitoring Systems

An adaptive nonparametric method is proposed for stable real-time detection of seismoacoustic sources in multichannel C-OTDR systems with a significant number of channels. This method guarantees given upper boundaries for probabilities of Type I and Type II errors. Properties of the proposed method are rigorously proved. The results of practical applications of the proposed method in a real C-OTDR-system are presented in this report.

Nuclear Data Evaluation for 217Po

Evaluated nuclear decay data for the 217Po nuclide is presented in the present work. These data include recommended values for the half-life T1/2, α-, β-- and γ-ray emission energies and probabilities. Decay data from 221Rn α and 217Bi β—decays are presented. Q(α) has been updated based on the recent published work of the Atomic Mass Evaluation AME2012. In addition, the logft values were calculated using the Logft program from the ENSDF evaluation package. Moreover, the total internal conversion electrons and the K-shell to L-shell and L-shell to M-shell and to N-shell conversion electrons ratios K/L, L/M and L/N have been calculated using Bricc program. Meanwhile, recommendation values or the multi-polarities have been assigned based on recently measurement yield a better intensity balance at the 254 keV and 264 keV gamma transitions.

Performance Analysis of Heterogeneous Cellular Networks with Multiple Connectivity

Future mobile networks following 5th generation will be characterized by one thousand times higher gains in capacity; connections for at least one hundred billion devices; user experience capable of extremely low latency and response times. To be close to the capacity requirements and higher reliability, advanced technologies have been studied, such as multiple connectivity, small cell enhancement, heterogeneous networking, and advanced interference and mobility management. This paper is focused on the multiple connectivity in heterogeneous cellular networks. We investigate the performance of coverage and user throughput in several deployment scenarios. Using the stochastic geometry approach, the SINR distributions and the coverage probabilities are derived in case of dual connection. Also, to compare the user throughput enhancement among the deployment scenarios, we calculate the spectral efficiency and discuss our results.

Evaluation of Expected Annual Loss Probabilities of RC Moment Resisting Frames

Building loss estimation methodologies which have been advanced considerably in recent decades are usually used to estimate socio and economic impacts resulting from seismic structural damage. In accordance with these methods, this paper presents the evaluation of an annual loss probability of a reinforced concrete moment resisting frame designed according to Korean Building Code. The annual loss probability is defined by (1) a fragility curve obtained from a capacity spectrum method which is similar to a method adopted from HAZUS, and (2) a seismic hazard curve derived from annual frequencies of exceedance per peak ground acceleration. Seismic fragilities are computed to calculate the annual loss probability of a certain structure using functions depending on structural capacity, seismic demand, structural response and the probability of exceeding damage state thresholds. This study carried out a nonlinear static analysis to obtain the capacity of a RC moment resisting frame selected as a prototype building. The analysis results show that the probability of being extensive structural damage in the prototype building is expected to 0.01% in a year.

Modeling Default Probabilities of the Chosen Czech Banks in the Time of the Financial Crisis

One of the most important tasks in the risk management is the correct determination of probability of default (PD) of particular financial subjects. In this paper a possibility of determination of financial institution’s PD according to the creditscoring models is discussed. The paper is divided into the two parts. The first part is devoted to the estimation of the three different models (based on the linear discriminant analysis, logit regression and probit regression) from the sample of almost three hundred US commercial banks. Afterwards these models are compared and verified on the control sample with the view to choose the best one. The second part of the paper is aimed at the application of the chosen model on the portfolio of three key Czech banks to estimate their present financial stability. However, it is not less important to be able to estimate the evolution of PD in the future. For this reason, the second task in this paper is to estimate the probability distribution of the future PD for the Czech banks. So, there are sampled randomly the values of particular indicators and estimated the PDs’ distribution, while it’s assumed that the indicators are distributed according to the multidimensional subordinated Lévy model (Variance Gamma model and Normal Inverse Gaussian model, particularly). Although the obtained results show that all banks are relatively healthy, there is still high chance that “a financial crisis” will occur, at least in terms of probability. This is indicated by estimation of the various quantiles in the estimated distributions. Finally, it should be noted that the applicability of the estimated model (with respect to the used data) is limited to the recessionary phase of the financial market.

The Guaranteed Detection of the Seismoacoustic Emission Source in the C-OTDR Systems

A method is proposed for stable detection of seismoacoustic sources in C-OTDR systems that guarantee given upper bounds for probabilities of type I and type II errors. Properties of the proposed method are rigorously proved. The results of practical applications of the proposed method in a real C-OTDRsystem are presented.

Unit Root Tests Based On the Robust Estimator

The unit root tests based on the robust estimator for the first-order autoregressive process are proposed and compared with the unit root tests based on the ordinary least squares (OLS) estimator. The percentiles of the null distributions of the unit root test are also reported. The empirical probabilities of Type I error and powers of the unit root tests are estimated via Monte Carlo simulation. Simulation results show that all unit root tests can control the probability of Type I error for all situations. The empirical power of the unit root tests based on the robust estimator are higher than the unit root tests based on the OLS estimator.

Volatility Switching between Two Regimes

Based on the fact that volatility is time varying in high frequency data and that periods of high volatility tend to cluster, the most successful and popular models in modeling time varying volatility are GARCH type models. When financial returns exhibit sudden jumps that are due to structural breaks, standard GARCH models show high volatility persistence, i.e. integrated behavior of the conditional variance. In such situations models in which the parameters are allowed to change over time are more appropriate. This paper compares different GARCH models in terms of their ability to describe structural changes in returns caused by financial crisis at stock markets of six selected central and east European countries. The empirical analysis demonstrates that Markov regime switching GARCH model resolves the problem of excessive persistence and outperforms uni-regime GARCH models in forecasting volatility when sudden switching occurs in response to financial crisis.

Multiple Targets Classification and Fuzzy Logic Decision Fusion in Wireless Sensor Networks

This paper proposes a hierarchical hidden Markov model (HHMM) to model the detection of M vehicles in a wireless sensor network (WSN). The HHMM model contains an extra level of hidden Markov model to model the temporal transitions of each state of the first HMM. By modeling the temporal transitions, only those hypothesis with nonzero transition probabilities needs to be tested. Thus, this method efficiently reduces the computation load, which is preferable in WSN applications.This paper integrates several techniques to optimize the detection performance. The output of the states of the first HMM is modeled as Gaussian Mixture Model (GMM), where the number of states and the number of Gaussians are experimentally determined, while the other parameters are estimated using Expectation Maximization (EM). HHMM is used to model the sequence of the local decisions which are based on multiple hypothesis testing with maximum likelihood approach. The states in the HHMM represent various combinations of vehicles of different types. Due to the statistical advantages of multisensor data fusion, we propose a heuristic based on fuzzy weighted majority voting to enhance cooperative classification of moving vehicles within a region that is monitored by a wireless sensor network. A fuzzy inference system weighs each local decision based on the signal to noise ratio of the acoustic signal for target detection and the signal to noise ratio of the radio signal for sensor communication. The spatial correlation among the observations of neighboring sensor nodes is efficiently utilized as well as the temporal correlation. Simulation results demonstrate the efficiency of this scheme.

Modelling Sudden Deaths from Myocardial Infarction and Stroke

Death within 30 days is an important factor to be looked into, as there is a significant risk of deaths immediately following or soon after, myocardial infarction (MI) or stroke. In this paper, we will model the deaths within 30 days following a myocardial infarction (MI) or stroke in the UK. We will see how the probabilities of sudden deaths from MI or stroke have changed over the period 1981-2000. We will model the sudden deaths using a generalized linear model (GLM), fitted using the R statistical package, under a Binomial distribution for the number of sudden deaths. We parameterize our model using the extensive and detailed data from the Framingham Heart Study, adjusted to match UK rates. The results show that there is a reduction for the sudden deaths following a MI over time but no significant improvement for sudden deaths following a stroke.

A Human Activity Recognition System Based On Sensory Data Related to Object Usage

Sensor-based Activity Recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Green Bridges and Their Migration Potential

Green bridges enable wildlife to pass through linear structures, especially freeways. The term migration potential is used to quantify their functionality. The proposed methodology for determining migration potential eliminates the mathematical, systematic and ecological inaccuracies of previous methodologies and provides a reliable tool for designers and environmentalists. The methodology is suited especially to medium-sized and large mammals, is mathematically correct, and its correspondence with reality was tested by monitoring existing green bridges. 

Confidence Interval for the Inverse of a Normal Mean with a Known Coefficient of Variation

In this paper, we propose two new confidence intervals for the inverse of a normal mean with a known coefficient of variation. One of new confidence intervals for the inverse of a normal mean with a known coefficient of variation is constructed based on the pivotal statistic Z where Z is a standard normal distribution and another confidence interval is constructed based on the generalized confidence interval, presented by Weerahandi. We examine the performance of these confidence intervals in terms of coverage probabilities and average lengths via Monte Carlo simulation.

Confidence Intervals for the Difference of Two Normal Population Variances

Motivated by the recent work of Herbert, Hayen, Macaskill and Walter [Interval estimation for the difference of two independent variances. Communications in Statistics, Simulation and Computation, 40: 744-758, 2011.], we investigate, in this paper, new confidence intervals for the difference between two normal population variances based on the generalized confidence interval of Weerahandi [Generalized Confidence Intervals. Journal of the American Statistical Association, 88(423): 899-905, 1993.] and the closed form method of variance estimation of Zou, Huo and Taleban [Simple confidence intervals for lognormal means and their differences with environmental applications. Environmetrics 20: 172-180, 2009]. Monte Carlo simulation results indicate that our proposed confidence intervals give a better coverage probability than that of the existing confidence interval. Also two new confidence intervals perform similarly based on their coverage probabilities and their average length widths.

Latent Semantic Inference for Agriculture FAQ Retrieval

FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.

A Single-Period Inventory Problem with Resalable Returns: A Fuzzy Stochastic Approach

In this paper, a single period inventory model with resalable returns has been analyzed in an imprecise and uncertain mixed environment. Demand has been introduced as a fuzzy random variable. In this model, a single order is placed before the start of the selling season. The customer, for a full refund, may return purchased products within a certain time interval. Returned products are resalable, provided they arrive back before the end of the selling season and are found to be undamaged. Products remaining at the end of the season are salvaged. All demands not met directly are lost. The probabilities that a sold product is returned and that a returned product is resalable, both imprecise in a real situation, have been assumed to be fuzzy in nature.

Approximations to the Distribution of the Sample Correlation Coefficient

Given a bivariate normal sample of correlated variables, (Xi, Yi), i = 1, . . . , n, an alternative estimator of Pearson’s correlation coefficient is obtained in terms of the ranges, |Xi − Yi|. An approximate confidence interval for ρX,Y is then derived, and a simulation study reveals that the resulting coverage probabilities are in close agreement with the set confidence levels. As well, a new approximant is provided for the density function of R, the sample correlation coefficient. A mixture involving the proposed approximate density of R, denoted by hR(r), and a density function determined from a known approximation due to R. A. Fisher is shown to accurately approximate the distribution of R. Finally, nearly exact density approximants are obtained on adjusting hR(r) by a 7th degree polynomial.

Characterization and Modeling of Packet Loss of a VoIP Communication

In this work, a characterization and modeling of packet loss of a Voice over Internet Protocol (VoIP) communication is developed. The distributions of the number of consecutive received and lost packets (namely gap and burst) are modeled from the transition probabilities of two-state and four-state model. Measurements show that both models describe adequately the burst distribution, but the decay of gap distribution for non-homogeneous losses is better fit by the four-state model. The respective probabilities of transition between states for each model were estimated with a proposed algorithm from a set of monitored VoIP calls in order to obtain representative minimum, maximum and average values for both models.

Performance Analysis of an Adaptive Threshold Hybrid Double-Dwell System with Antenna Diversity for Acquisition in DS-CDMA Systems

In this paper, we consider the analysis of the acquisition process for a hybrid double-dwell system with antenna diversity for DS-CDMA (direct sequence-code division multiple access) using an adaptive threshold. Acquisition systems with a fixed threshold value are unable to adapt to fast varying mobile communications environments and may result in a high false alarm rate, and/or low detection probability. Therefore, we propose an adaptively varying threshold scheme through the use of a cellaveraging constant false alarm rate (CA-CFAR) algorithm, which is well known in the field of radar detection. We derive exact expressions for the probabilities of detection and false alarm in Rayleigh fading channels. The mean acquisition time of the system under consideration is also derived. The performance of the system is analyzed and compared to that of a hybrid single dwell system.