Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: An adaptive nonparametric method is proposed for
stable real-time detection of seismoacoustic sources in multichannel
C-OTDR systems with a significant number of channels. This
method guarantees given upper boundaries for probabilities of Type I
and Type II errors. Properties of the proposed method are rigorously
proved. The results of practical applications of the proposed method
in a real C-OTDR-system are presented in this report.
Abstract: Evaluated nuclear decay data for the 217Po nuclide is
presented in the present work. These data include recommended
values for the half-life T1/2, α-, β-- and γ-ray emission energies and
probabilities. Decay data from 221Rn α and 217Bi β—decays are
presented. Q(α) has been updated based on the recent published work
of the Atomic Mass Evaluation AME2012. In addition, the logft
values were calculated using the Logft program from the ENSDF
evaluation package. Moreover, the total internal conversion electrons
and the K-shell to L-shell and L-shell to M-shell and to N-shell
conversion electrons ratios K/L, L/M and L/N have been calculated
using Bricc program. Meanwhile, recommendation values or the
multi-polarities have been assigned based on recently measurement
yield a better intensity balance at the 254 keV and 264 keV gamma
transitions.
Abstract: Future mobile networks following 5th generation will
be characterized by one thousand times higher gains in capacity;
connections for at least one hundred billion devices; user experience
capable of extremely low latency and response times. To be close to
the capacity requirements and higher reliability, advanced
technologies have been studied, such as multiple connectivity, small
cell enhancement, heterogeneous networking, and advanced
interference and mobility management. This paper is focused on the
multiple connectivity in heterogeneous cellular networks. We
investigate the performance of coverage and user throughput in several
deployment scenarios. Using the stochastic geometry approach, the
SINR distributions and the coverage probabilities are derived in case
of dual connection. Also, to compare the user throughput enhancement
among the deployment scenarios, we calculate the spectral efficiency
and discuss our results.
Abstract: Building loss estimation methodologies which have
been advanced considerably in recent decades are usually used to
estimate socio and economic impacts resulting from seismic structural
damage. In accordance with these methods, this paper presents the
evaluation of an annual loss probability of a reinforced concrete
moment resisting frame designed according to Korean Building Code.
The annual loss probability is defined by (1) a fragility curve obtained
from a capacity spectrum method which is similar to a method adopted
from HAZUS, and (2) a seismic hazard curve derived from annual
frequencies of exceedance per peak ground acceleration. Seismic
fragilities are computed to calculate the annual loss probability of a
certain structure using functions depending on structural capacity,
seismic demand, structural response and the probability of exceeding
damage state thresholds. This study carried out a nonlinear static
analysis to obtain the capacity of a RC moment resisting frame
selected as a prototype building. The analysis results show that the
probability of being extensive structural damage in the prototype
building is expected to 0.01% in a year.
Abstract: One of the most important tasks in the risk
management is the correct determination of probability of default
(PD) of particular financial subjects. In this paper a possibility of
determination of financial institution’s PD according to the creditscoring
models is discussed. The paper is divided into the two parts.
The first part is devoted to the estimation of the three different
models (based on the linear discriminant analysis, logit regression
and probit regression) from the sample of almost three hundred US
commercial banks. Afterwards these models are compared and
verified on the control sample with the view to choose the best one.
The second part of the paper is aimed at the application of the chosen
model on the portfolio of three key Czech banks to estimate their
present financial stability. However, it is not less important to be able
to estimate the evolution of PD in the future. For this reason, the
second task in this paper is to estimate the probability distribution of
the future PD for the Czech banks. So, there are sampled randomly
the values of particular indicators and estimated the PDs’ distribution,
while it’s assumed that the indicators are distributed according to the
multidimensional subordinated Lévy model (Variance Gamma model
and Normal Inverse Gaussian model, particularly). Although the
obtained results show that all banks are relatively healthy, there is
still high chance that “a financial crisis” will occur, at least in terms
of probability. This is indicated by estimation of the various quantiles
in the estimated distributions. Finally, it should be noted that the
applicability of the estimated model (with respect to the used data) is
limited to the recessionary phase of the financial market.
Abstract: A method is proposed for stable detection of
seismoacoustic sources in C-OTDR systems that guarantee given
upper bounds for probabilities of type I and type II errors. Properties
of the proposed method are rigorously proved. The results of
practical applications of the proposed method in a real C-OTDRsystem
are presented.
Abstract: The unit root tests based on the robust estimator for the first-order autoregressive process are proposed and compared with the unit root tests based on the ordinary least squares (OLS) estimator. The percentiles of the null distributions of the unit root test are also reported. The empirical probabilities of Type I error and powers of the unit root tests are estimated via Monte Carlo simulation. Simulation results show that all unit root tests can control the probability of Type I error for all situations. The empirical power of the unit root tests based on the robust estimator are higher than the unit root tests based on the OLS estimator.
Abstract: Based on the fact that volatility is time varying in high frequency data and that periods of high volatility tend to cluster, the most successful and popular models in modeling time varying volatility are GARCH type models. When financial returns exhibit sudden jumps that are due to structural breaks, standard GARCH models show high volatility persistence, i.e. integrated behavior of the conditional variance. In such situations models in which the parameters are allowed to change over time are more appropriate. This paper compares different GARCH models in terms of their ability to describe structural changes in returns caused by financial crisis at stock markets of six selected central and east European countries. The empirical analysis demonstrates that Markov regime switching GARCH model resolves the problem of excessive persistence and outperforms uni-regime GARCH models in forecasting volatility when sudden switching occurs in response to financial crisis.
Abstract: This paper proposes a hierarchical hidden Markov model (HHMM) to model the detection of M vehicles in a wireless sensor network (WSN). The HHMM model contains an extra level of hidden Markov model to model the temporal transitions of each
state of the first HMM. By modeling the temporal transitions, only those hypothesis with nonzero transition probabilities needs to be tested. Thus, this method efficiently reduces the computation load, which is preferable in WSN applications.This paper integrates several techniques to optimize the detection performance. The output of the states of the first HMM is modeled as Gaussian Mixture Model (GMM), where the number of states and the number of Gaussians are experimentally determined, while the other parameters are estimated using Expectation Maximization (EM). HHMM is used to model the sequence of the local decisions which are based on multiple hypothesis testing with maximum likelihood approach. The states in the HHMM represent various combinations of vehicles of different types. Due to the statistical advantages of multisensor data fusion, we propose a heuristic based on fuzzy weighted majority voting to enhance cooperative classification of moving vehicles within a region that is monitored by a wireless sensor network. A fuzzy inference system weighs each local decision based on the signal to noise
ratio of the acoustic signal for target detection and the signal to noise ratio of the radio signal for sensor communication. The spatial correlation among the observations of neighboring sensor nodes is efficiently utilized as well as the temporal correlation. Simulation results demonstrate the efficiency of this scheme.
Abstract: Death within 30 days is an important factor to be looked into, as there is a significant risk of deaths immediately following or soon after, myocardial infarction (MI) or stroke. In this paper, we will model the deaths within 30 days following a myocardial infarction (MI) or stroke in the UK. We will see how the probabilities of sudden deaths from MI or stroke have changed over the period 1981-2000. We will model the sudden deaths using a generalized linear model (GLM), fitted using the R statistical package, under a Binomial distribution for the number of sudden deaths. We parameterize our model using the extensive and detailed data from the Framingham Heart Study, adjusted to match UK rates. The results show that there is a reduction for the sudden deaths following a MI over time but no significant improvement for sudden deaths following a stroke.
Abstract: Sensor-based Activity Recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.
Abstract: Green bridges enable wildlife to pass through linear structures, especially freeways. The term migration potential is used to quantify their functionality. The proposed methodology for determining migration potential eliminates the mathematical, systematic and ecological inaccuracies of previous methodologies and provides a reliable tool for designers and environmentalists. The methodology is suited especially to medium-sized and large mammals, is mathematically correct, and its correspondence with reality was tested by monitoring existing green bridges.
Abstract: In this paper, we propose two new confidence intervals for the inverse of a normal mean with a known coefficient of variation. One of new confidence intervals for the inverse of a normal mean with a known coefficient of variation is constructed based on the pivotal statistic Z where Z is a standard normal distribution and another confidence interval is constructed based on the generalized confidence interval, presented by Weerahandi. We examine the performance of these confidence intervals in terms of coverage probabilities and average lengths via Monte Carlo simulation.
Abstract: Motivated by the recent work of Herbert, Hayen, Macaskill and Walter [Interval estimation for the difference of two independent variances. Communications in Statistics, Simulation and Computation, 40: 744-758, 2011.], we investigate, in this paper, new confidence intervals for the difference between two normal population variances based on the generalized confidence interval of Weerahandi [Generalized Confidence Intervals. Journal of the American Statistical Association, 88(423): 899-905, 1993.] and the closed form method of variance estimation of Zou, Huo and Taleban [Simple confidence intervals for lognormal means and their differences with environmental applications. Environmetrics 20: 172-180, 2009]. Monte Carlo simulation results indicate that our proposed confidence intervals give a better coverage probability than that of the existing confidence interval. Also two new confidence intervals perform similarly based on their coverage probabilities and their average length widths.
Abstract: FAQ system can make user find answer to the problem that puzzles them. But now the research on Chinese FAQ system is still on the theoretical stage. This paper presents an approach to semantic inference for FAQ mining. To enhance the efficiency, a small pool of the candidate question-answering pairs retrieved from the system for the follow-up work according to the concept of the agriculture domain extracted from user input .Input queries or questions are converted into four parts, the question word segment (QWS), the verb segment (VS), the concept of agricultural areas segment (CS), the auxiliary segment (AS). A semantic matching method is presented to estimate the similarity between the semantic segments of the query and the questions in the pool of the candidate. A thesaurus constructed from the HowNet, a Chinese knowledge base, is adopted for word similarity measure in the matcher. The questions are classified into eleven intension categories using predefined question stemming keywords. For FAQ mining, given a query, the question part and answer part in an FAQ question-answer pair is matched with the input query, respectively. Finally, the probabilities estimated from these two parts are integrated and used to choose the most likely answer for the input query. These approaches are experimented on an agriculture FAQ system. Experimental results indicate that the proposed approach outperformed the FAQ-Finder system in agriculture FAQ retrieval.
Abstract: In this paper, a single period inventory model with resalable returns has been analyzed in an imprecise and uncertain mixed environment. Demand has been introduced as a fuzzy random variable. In this model, a single order is placed before the start of the selling season. The customer, for a full refund, may return purchased products within a certain time interval. Returned products are resalable, provided they arrive back before the end of the selling season and are found to be undamaged. Products remaining at the end of the season are salvaged. All demands not met directly are lost. The probabilities that a sold product is returned and that a returned product is resalable, both imprecise in a real situation, have been assumed to be fuzzy in nature.
Abstract: Given a bivariate normal sample of correlated variables,
(Xi, Yi), i = 1, . . . , n, an alternative estimator of Pearson’s correlation
coefficient is obtained in terms of the ranges, |Xi − Yi|.
An approximate confidence interval for ρX,Y is then derived, and
a simulation study reveals that the resulting coverage probabilities
are in close agreement with the set confidence levels. As well, a
new approximant is provided for the density function of R, the
sample correlation coefficient. A mixture involving the proposed
approximate density of R, denoted by hR(r), and a density function
determined from a known approximation due to R. A. Fisher is shown
to accurately approximate the distribution of R. Finally, nearly exact
density approximants are obtained on adjusting hR(r) by a 7th degree
polynomial.
Abstract: In this work, a characterization and modeling of
packet loss of a Voice over Internet Protocol (VoIP) communication
is developed. The distributions of the number of consecutive received
and lost packets (namely gap and burst) are modeled from the
transition probabilities of two-state and four-state model.
Measurements show that both models describe adequately the burst
distribution, but the decay of gap distribution for non-homogeneous
losses is better fit by the four-state model. The respective
probabilities of transition between states for each model were
estimated with a proposed algorithm from a set of monitored VoIP
calls in order to obtain representative minimum, maximum and
average values for both models.
Abstract: In this paper, we consider the analysis of the
acquisition process for a hybrid double-dwell system with antenna
diversity for DS-CDMA (direct sequence-code division multiple
access) using an adaptive threshold. Acquisition systems with a fixed
threshold value are unable to adapt to fast varying mobile
communications environments and may result in a high false alarm
rate, and/or low detection probability. Therefore, we propose an
adaptively varying threshold scheme through the use of a cellaveraging
constant false alarm rate (CA-CFAR) algorithm, which is
well known in the field of radar detection. We derive exact
expressions for the probabilities of detection and false alarm in
Rayleigh fading channels. The mean acquisition time of the system
under consideration is also derived. The performance of the system is
analyzed and compared to that of a hybrid single dwell system.