Abstract: Edgeworth Approximation, Bootstrap and Monte Carlo Simulations have a considerable impact on the achieving certain results related to different problems taken into study. In our paper, we have treated a financial case related to the effect that have the components of a Cash-Flow of one of the most successful businesses in the world, as the financial activity, operational activity and investing activity to the cash and cash equivalents at the end of the three-months period. To have a better view of this case we have created a Vector Autoregression model, and after that we have generated the impulse responses in the terms of Asymptotic Analysis (Edgeworth Approximation), Monte Carlo Simulations and Residual Bootstrap based on the standard errors of every series created. The generated results consisted of the common tendencies for the three methods applied, that consequently verified the advantage of the three methods in the optimization of the model that contains many variants.
Abstract: Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.
Abstract: Water lily (Nymphaea L.) is the largest genus of Nymphaeaceae. This family is composed of six genera (Nuphar, Ondinea, Euryale, Victoria, Barclaya, Nymphaea). Its members are nearly worldwide in tropical and temperate regions. The classification of some species in Nymphaea is ambiguous due to high variation in leaf and flower parts such as leaf margin, stamen appendage. Therefore, the phylogenetic relationships based on 18S rDNA were constructed to delimit this genus. DNAs of 52 specimens belonging to water lily family were extracted using modified conventional method containing cetyltrimethyl ammonium bromide (CTAB). The results showed that the amplified fragment is about 1600 base pairs in size. After analysis, the aligned sequences presented 9.36% for variable characters comprising 2.66% of parsimonious informative sites and 6.70% of singleton sites. Moreover, there are 6 regions of 1-2 base(s) for insertion/deletion. The phylogenetic trees based on maximum parsimony and maximum likelihood with high bootstrap support indicated that genus Nymphaea was a paraphyletic group because of Ondinea, Victoria and Euryale disruption. Within genus Nymphaea, subgenus Nymphaea is a basal lineage group which cooperated with Euryale and Victoria. The other four subgenera, namely Lotos, Hydrocallis, Brachyceras and Anecphya were included the same large clade which Ondinea was placed within Anecphya clade due to geographical sharing.
Abstract: In any production process, every product is aimed to attain a certain standard, but the presence of assignable cause of variability affects our process, thereby leading to low quality of product. The ability to identify and remove this type of variability reduces its overall effect, thereby improving the quality of the product. In case of a univariate control chart signal, it is easy to detect the problem and give a solution since it is related to a single quality characteristic. However, the problems involved in the use of multivariate control chart are the violation of multivariate normal assumption and the difficulty in identifying the quality characteristic(s) that resulted in the out of control signals. The purpose of this paper is to examine the use of non-parametric control chart (the bootstrap approach) for obtaining control limit to overcome the problem of multivariate distributional assumption and the p-value method for detecting out of control signals. Results from a performance study show that the proposed bootstrap method enables the setting of control limit that can enhance the detection of out of control signals when compared, while the p-value method also enhanced in identifying out of control variables.
Abstract: This paper used an asymmetric informative concept to apply in the macroeconomic model estimation of the tourism sector in Thailand. The variables used to statistically analyze are Thailand international and domestic tourism revenues, the expenditures of foreign and domestic tourists, service investments by private sectors, service investments by the government of Thailand, Thailand service imports and exports, and net service income transfers. All of data is a time-series index which was observed between 2002 and 2015. Empirically, the tourism multiplier and accelerator were estimated by two statistical approaches. The first was the result of the Generalized Method of Moments model (GMM) based on the assumption which the tourism market in Thailand had perfect information (Symmetrical data). The second was the result of the Maximum Entropy Bootstrapping approach (MEboot) based on the process that attempted to deal with imperfect information and reduced uncertainty in data observations (Asymmetrical data). In addition, the tourism leakages were investigated by a simple model based on the injections and leakages concept. The empirical findings represented the parameters computed from the MEboot approach which is different from the GMM method. However, both of the MEboot estimation and GMM model suggests that Thailand’s tourism sectors are in a period capable of stimulating the economy.
Abstract: This work contributes a statistical model and simulation
framework yielding the best estimate possible for the potential
herbicide reduction when using the MoDiCoVi algorithm all the
while requiring a efficacy comparable to conventional spraying. In
June 2013 a maize field located in Denmark were seeded. The field
was divided into parcels which was assigned to one of two main
groups: 1) Control, consisting of subgroups of no spray and full dose
spraty; 2) MoDiCoVi algorithm subdivided into five different leaf
cover thresholds for spray activation. In addition approximately 25%
of the parcels were seeded with additional weeds perpendicular to
the maize rows. In total 299 parcels were randomly assigned with
the 28 different treatment combinations. In the statistical analysis,
bootstrapping was used for balancing the number of replicates. The
achieved potential herbicide savings was found to be 70% to 95%
depending on the initial weed coverage. However additional field
trials covering more seasons and locations are needed to verify
the generalisation of these results. There is a potential for further
herbicide savings as the time interval between the first and second
spraying session was not long enough for the weeds to turn yellow,
instead they only stagnated in growth.
Abstract: This paper presents the confidence intervals for the
effect size base on bootstrap resampling method. The meta-analytic
confidence interval for effect size is proposed that are easy to
compute. A Monte Carlo simulation study was conducted to compare
the performance of the proposed confidence intervals with the
existing confidence intervals. The best confidence interval method
will have a coverage probability close to 0.95. Simulation results
have shown that our proposed confidence intervals perform well in
terms of coverage probability and expected length.
Abstract: Recently, Job Recommender Systems have gained
much attention in industries since they solve the problem of
information overload on the recruiting website. Therefore, we
proposed Extended Personalized Job System that has the capability of
providing the appropriate jobs for job seeker and recommending
some suitable information for them using Data Mining Techniques
and Dynamic User Profile. On the other hands, company can also
interact to the system for publishing and updating job information.
This system have emerged and supported various platforms such as
web application and android mobile application. In this paper, User
profiles, Implicit User Action, User Feedback, and Clustering
Techniques in WEKA libraries were applied and implemented. In
additions, open source tools like Yii Web Application Framework,
Bootstrap Front End Framework and Android Mobile Technology
were also applied.
Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: Previous studies on financial distress prediction choose
the conventional failing and non-failing dichotomy; however, the
distressed extent differs substantially among different financial
distress events. To solve the problem, “non-distressed”, “slightlydistressed”
and “reorganization and bankruptcy” are used in our article
to approximate the continuum of corporate financial health. This paper
explains different financial distress events using the two-stage method.
First, this investigation adopts firm-specific financial ratios, corporate
governance and market factors to measure the probability of various
financial distress events based on multinomial logit models.
Specifically, the bootstrapping simulation is performed to examine the
difference of estimated misclassifying cost (EMC). Second, this work
further applies macroeconomic factors to establish the credit cycle
index and determines the distressed cut-off indicator of the two-stage
models using such index. Two different models, one-stage and
two-stage prediction models are developed to forecast financial
distress, and the results acquired from different models are compared
with each other, and with the collected data. The findings show that the
one-stage model has the lower misclassification error rate than the
two-stage model. The one-stage model is more accurate than the
two-stage model.
Abstract: This paper describes the problem of building secure
computational services for encrypted information in the Cloud
Computing without decrypting the encrypted data; therefore, it meets
the yearning of computational encryption algorithmic aspiration
model that could enhance the security of big data for privacy,
confidentiality, availability of the users. The cryptographic model
applied for the computational process of the encrypted data is the
Fully Homomorphic Encryption Scheme. We contribute a theoretical
presentations in a high-level computational processes that are based
on number theory and algebra that can easily be integrated and
leveraged in the Cloud computing with detail theoretic mathematical
concepts to the fully homomorphic encryption models. This
contribution enhances the full implementation of big data analytics
based cryptographic security algorithm.
Abstract: This paper focuses on the assessment of the air
pollution and morbidity relationship in Tunisia. Air pollution is
measured by ozone air concentration and the morbidity is measured
by the number of respiratory-related restricted activity days during
the 2-week period prior to the interview. Socioeconomic data are also
collected in order to adjust for any confounding covariates. Our
sample is composed by 407 Tunisian respondents; 44.7% are women,
the average age is 35.2, near 69% are living in a house built after
1980, and 27.8% have reported at least one day of respiratory-related
restricted activity. The model consists on the regression of the
number of respiratory-related restricted activity days on the air
quality measure and the socioeconomic covariates. In order to correct
for zero-inflation and heterogeneity, we estimate several models
(Poisson, negative binomial, zero inflated Poisson, Poisson hurdle,
negative binomial hurdle and finite mixture Poisson models).
Bootstrapping and post-stratification techniques are used in order to
correct for any sample bias. According to the Akaike information
criteria, the hurdle negative binomial model has the greatest goodness
of fit. The main result indicates that, after adjusting for
socioeconomic data, the ozone concentration increases the probability
of positive number of restricted activity days.
Abstract: With short production development times, there is an increased need to demonstrate product reliability relatively quickly with minimal testing. In such cases there may be few if any observed failures. Thus it may be difficult to assess reliability using the traditional reliability test plans that measure only time (or cycles) to failure. For many components, degradation measures will contain important information about performance and reliability. These measures can be used to design a minimal test plan, in terms of number of units placed on test and duration of the test, necessary to demonstrate a reliability goal. In this work we present a case study involving an electronic component subject to degradation. The data, consisting of 42 degradation paths of cycles to failure, are first used to estimate a reliability function. Bootstrapping techniques are then used to perform power studies and develop a minimal reliability test plan for future production of this component.
Abstract: Over the past few decades, power system industry in many developing and developed countries has gone through a restructuring process of the industry where they are moving towards deregulated power industry. This situation will lead to competition among the generation and distribution companies to provide quality and efficient production of electric energy, which will reduce the price of electricity. Therefore it is important to obtain an accurate value of the available transfer capability (ATC) and transmission reliability margin (TRM) in order to ensure the effective power transfer between areas during the occurrence of uncertainties in the system. In this paper, the TRM and ATC is determined by taking into consideration the uncertainties of the system operating condition and system cascading collapse by applying the bootstrap technique. A case study of the IEEE RTS-79 is employed to verify the robustness of the technique proposed in the determination of TRM and ATC.
Abstract: The log periodogram regression is widely used in empirical
applications because of its simplicity, since only a least squares
regression is required to estimate the memory parameter, d, its good
asymptotic properties and its robustness to misspecification of the
short term behavior of the series. However, the asymptotic distribution
is a poor approximation of the (unknown) finite sample distribution
if the sample size is small. Here the finite sample performance of different
nonparametric residual bootstrap procedures is analyzed when
applied to construct confidence intervals. In particular, in addition to
the basic residual bootstrap, the local and block bootstrap that might
adequately replicate the structure that may arise in the errors of the
regression are considered when the series shows weak dependence in
addition to the long memory component. Bias correcting bootstrap
to adjust the bias caused by that structure is also considered. Finally,
the performance of the bootstrap in log periodogram regression based
confidence intervals is assessed in different type of models and how
its performance changes as sample size increases.
Abstract: In this work we study the effect of several covariates X on a censored response variable T with unknown probability distribution. In this context, most of the studies in the literature can be located in two possible general classes of regression models: models that study the effect the covariates have on the hazard function; and models that study the effect the covariates have on the censored response variable. Proposals in this paper are in the second class of models and, more specifically, on least squares based model approach. Thus, using the bootstrap estimate of the bias, we try to improve the estimation of the regression parameters by reducing their bias, for small sample sizes. Simulation results presented in the paper show that, for reasonable sample sizes and censoring levels, the bias is always smaller for the new proposals.
Abstract: An effective visual error concealment method has been presented by employing a robust rotation, scale, and translation (RST) invariant partial patch matching model (RSTI-PPMM) and
exemplar-based inpainting. While the proposed robust and inherently
feature-enhanced texture synthesis approach ensures the generation
of excellent and perceptually plausible visual error concealment results, the outlier pruning property guarantees the significant quality improvements, both quantitatively and qualitatively. No intermediate
user-interaction is required for the pre-segmented media and the
presented method follows a bootstrapping approach for an automatic
visual loss recovery and the image and video error concealment.
Abstract: Zero inflated Strict Arcsine model is a newly developed model which is found to be appropriate in modeling overdispersed count data. In this study, maximum likelihood estimation method is used in estimating the parameters for zero inflated strict arcsine model. Bootstrapping is then employed to compute the confidence intervals for the estimated parameters.
Abstract: This paper presents a general trainable framework
for fast and robust upright human face and non-human object
detection and verification in static images. To enhance the
performance of the detection process, the technique we develop is
based on the combination of fast neural network (FNN) and
classical neural network (CNN). In FNN, a useful correlation is
exploited to sustain high level of detection accuracy between input
image and the weight of the hidden neurons. This is to enable the
use of Fourier transform that significantly speed up the time
detection. The combination of CNN is responsible to verify the
face region. A bootstrap algorithm is used to collect non human
object, which adds the false detection to the training process of the
human and non-human object. Experimental results on test images
with both simple and complex background demonstrate that the
proposed method has obtained high detection rate and low false
positive rate in detecting both human face and non-human object.
Abstract: In this work, we present a novel active learning approach
for learning a visual object detection system. Our system
is composed of an active learning mechanism as wrapper around
a sub-algorithm which implement an online boosting-based learning
object detector. In the core is a combination of a bootstrap procedure
and a semi automatic learning process based on the online boosting
procedure. The idea is to exploit the availability of classifier during
learning to automatically label training samples and increasingly
improves the classifier. This addresses the issue of reducing labeling
effort meanwhile obtain better performance. In addition, we propose
a verification process for further improvement of the classifier.
The idea is to allow re-update on seen data during learning for
stabilizing the detector. The main contribution of this empirical study
is a demonstration that active learning based on an online boosting
approach trained in this manner can achieve results comparable or
even outperform a framework trained in conventional manner using
much more labeling effort. Empirical experiments on challenging data
set for specific object deteciton problems show the effectiveness of
our approach.