Abstract: One of the biggest challenges in nonparametric
regression is the curse of dimensionality. Additive models are known
to overcome this problem by estimating only the individual additive
effects of each covariate. However, if the model is misspecified, the
accuracy of the estimator compared to the fully nonparametric one
is unknown. In this work the efficiency of completely nonparametric
regression estimators such as the Loess is compared to the estimators
that assume additivity in several situations, including additive and
non-additive regression scenarios. The comparison is done by
computing the oracle mean square error of the estimators with regards
to the true nonparametric regression function. Then, a backward
elimination selection procedure based on the Akaike Information
Criteria is proposed, which is computed from either the additive or
the nonparametric model. Simulations show that if the additive model
is misspecified, the percentage of time it fails to select important
variables can be higher than that of the fully nonparametric approach.
A dimension reduction step is included when nonparametric estimator
cannot be computed due to the curse of dimensionality. Finally, the
Boston housing dataset is analyzed using the proposed backward
elimination procedure and the selected variables are identified.
Abstract: Piecewise polynomial regression model is very flexible model for modeling the data. If the piecewise polynomial regression model is matched against the data, its parameters are not generally known. This paper studies the parameter estimation problem of piecewise polynomial regression model. The method which is used to estimate the parameters of the piecewise polynomial regression model is Bayesian method. Unfortunately, the Bayes estimator cannot be found analytically. Reversible jump MCMC algorithm is proposed to solve this problem. Reversible jump MCMC algorithm generates the Markov chain that converges to the limit distribution of the posterior distribution of piecewise polynomial regression model parameter. The resulting Markov chain is used to calculate the Bayes estimator for the parameters of piecewise polynomial regression model.
Abstract: The purpose of the paper is to estimate the US small
wind turbines market potential and forecast the small wind turbines
sales in the US. The forecasting method is based on the application of
the Bass model and the generalized Bass model of innovations
diffusion under replacement purchases. In the work an exponential
distribution is used for modeling of replacement purchases. Only one
parameter of such distribution is determined by average lifetime of
small wind turbines. The identification of the model parameters is
based on nonlinear regression analysis on the basis of the annual
sales statistics which has been published by the American Wind
Energy Association (AWEA) since 2001 up to 2012. The estimation
of the US average market potential of small wind turbines (for
adoption purchases) without account of price changes is 57080
(confidence interval from 49294 to 64866 at P = 0.95) under average
lifetime of wind turbines 15 years, and 62402 (confidence interval
from 54154 to 70648 at P = 0.95) under average lifetime of wind
turbines 20 years. In the first case the explained variance is 90,7%,
while in the second - 91,8%. The effect of the wind turbines price
changes on their sales was estimated using generalized Bass model.
This required a price forecast. To do this, the polynomial regression
function, which is based on the Berkeley Lab statistics, was used. The
estimation of the US average market potential of small wind turbines
(for adoption purchases) in that case is 42542 (confidence interval
from 32863 to 52221 at P = 0.95) under average lifetime of wind
turbines 15 years, and 47426 (confidence interval from 36092 to
58760 at P = 0.95) under average lifetime of wind turbines 20 years.
In the first case the explained variance is 95,3%, while in the second
– 95,3%.
Abstract: One of the essential sectors of Myanmar economy is
agriculture which is sensitive to climate variation. The most
important climatic element which impacts on agriculture sector is
rainfall. Thus rainfall prediction becomes an important issue in
agriculture country. Multi variables polynomial regression (MPR)
provides an effective way to describe complex nonlinear input output
relationships so that an outcome variable can be predicted from the
other or others. In this paper, the modeling of monthly rainfall
prediction over Myanmar is described in detail by applying the
polynomial regression equation. The proposed model results are
compared to the results produced by multiple linear regression model
(MLR). Experiments indicate that the prediction model based on
MPR has higher accuracy than using MLR.
Abstract: In this article, we propose a methodology for the
characterization of the suspended matter along Algiers-s bay. An
approach by multi layers perceptron (MLP) with training by back
propagation of the gradient optimized by the algorithm of Levenberg
Marquardt (LM) is used. The accent was put on the choice of the
components of the base of training where a comparative study made
for four methods: Random and three alternatives of classification by
K-Means. The samples are taken from suspended matter image,
obtained by analytical model based on polynomial regression by
taking account of in situ measurements. The mask which selects the
zone of interest (water in our case) was carried out by using a multi
spectral classification by ISODATA algorithm. To improve the
result of classification, a cleaning of this mask was carried out using
the tools of mathematical morphology. The results of this study
presented in the forms of curves, tables and of images show the
founded good of our methodology.
Abstract: Uncertainties of a serial production line affect on the
production throughput. The uncertainties cannot be prevented in a
real production line. However the uncertain conditions can be
controlled by a robust prediction model. Thus, a hybrid model
including autoregressive integrated moving average (ARIMA) and
multiple polynomial regression, is proposed to model the nonlinear
relationship of production uncertainties with throughput. The
uncertainties under consideration of this study are demand, breaktime,
scrap, and lead-time. The nonlinear relationship of production
uncertainties with throughput are examined in the form of quadratic
and cubic regression models, where the adjusted R-squared for
quadratic and cubic regressions was 98.3% and 98.2%. We optimized
the multiple quadratic regression (MQR) by considering the time
series trend of the uncertainties using ARIMA model. Finally the
hybrid model of ARIMA and MQR is formulated by better adjusted
R-squared, which is 98.9%.
Abstract: Smoothing or filtering of data is first preprocessing step
for noise suppression in many applications involving data analysis.
Moving average is the most popular method of smoothing the data,
generalization of this led to the development of Savitzky-Golay filter.
Many window smoothing methods were developed by convolving
the data with different window functions for different applications;
most widely used window functions are Gaussian or Kaiser. Function
approximation of the data by polynomial regression or Fourier
expansion or wavelet expansion also gives a smoothed data. Wavelets
also smooth the data to great extent by thresholding the wavelet
coefficients. Almost all smoothing methods destroys the peaks and
flatten them when the support of the window is increased. In certain
applications it is desirable to retain peaks while smoothing the data
as much as possible. In this paper we present a methodology called
as peak-wise smoothing that will smooth the data to any desired level
without losing the major peak features.
Abstract: In this paper, the implementation of a rule-based
intuitive reasoner is presented. The implementation included two
parts: the rule induction module and the intuitive reasoner. A large
weather database was acquired as the data source. Twelve weather
variables from those data were chosen as the “target variables"
whose values were predicted by the intuitive reasoner. A “complex"
situation was simulated by making only subsets of the data available
to the rule induction module. As a result, the rules induced were
based on incomplete information with variable levels of certainty.
The certainty level was modeled by a metric called "Strength of
Belief", which was assigned to each rule or datum as ancillary
information about the confidence in its accuracy. Two techniques
were employed to induce rules from the data subsets: decision tree
and multi-polynomial regression, respectively for the discrete and the
continuous type of target variables. The intuitive reasoner was tested
for its ability to use the induced rules to predict the classes of the
discrete target variables and the values of the continuous target
variables. The intuitive reasoner implemented two types of
reasoning: fast and broad where, by analogy to human thought, the
former corresponds to fast decision making and the latter to deeper
contemplation. . For reference, a weather data analysis approach
which had been applied on similar tasks was adopted to analyze the
complete database and create predictive models for the same 12
target variables. The values predicted by the intuitive reasoner and
the reference approach were compared with actual data. The intuitive
reasoner reached near-100% accuracy for two continuous target
variables. For the discrete target variables, the intuitive reasoner
predicted at least 70% as accurately as the reference reasoner. Since
the intuitive reasoner operated on rules derived from only about 10%
of the total data, it demonstrated the potential advantages in dealing
with sparse data sets as compared with conventional methods.
Abstract: Based on assumptions of neo-classical economics and
rational choice / public choice theory, this paper investigates the
regulation of industrial land use in Taiwan by homeowners
associations (HOAs) as opposed to traditional government
administration. The comparison, which applies the transaction cost
theory and a polynomial regression analysis, manifested that HOAs
are superior to conventional government administration in terms of
transaction costs and overall efficiency. A case study that compares
Taiwan-s commonhold industrial park, NangKang Software Park, to
traditional government counterparts using limited data on the costs
and returns was analyzed. This empirical study on the relative
efficiency of governmental and private institutions justified the
important theoretical proposition. Numerical results prove the
efficiency of the established model.
Abstract: The adsorption of simulated aqueous solution containing textile remazol reactive dye, namely Red 3BS by palm shell activated carbon (PSAC) as adsorbent was carried out using Response Surface Methodology (RSM). A Box-Behnken design in three most important operating variables; initial dye concentration, dosage of adsorbent and speed of impeller was employed for experimental design and optimization of results. The significance of independent variables and their interactions were tested by means of the analysis of variance (ANOVA) with 95% confidence limits. Model indicated that with the increasing of dosage and speed give the result of removal up to 90% with the capacity uptake more than 7 mg/g. High regression coefficient between the variables and the response (R-Sq = 93.9%) showed of good evaluation of experimental data by polynomial regression model.
Abstract: The approach of subset selection in polynomial
regression model building assumes that the chosen fixed full set of
predefined basis functions contains a subset that is sufficient to
describe the target relation sufficiently well. However, in most cases
the necessary set of basis functions is not known and needs to be
guessed – a potentially non-trivial (and long) trial and error process.
In our research we consider a potentially more efficient approach –
Adaptive Basis Function Construction (ABFC). It lets the model
building method itself construct the basis functions necessary for
creating a model of arbitrary complexity with adequate predictive
performance. However, there are two issues that to some extent
plague the methods of both the subset selection and the ABFC,
especially when working with relatively small data samples: the
selection bias and the selection instability. We try to correct these
issues by model post-evaluation using Cross-Validation and model
ensembling. To evaluate the proposed method, we empirically
compare it to ABFC methods without ensembling, to a widely used
method of subset selection, as well as to some other well-known
regression modeling methods, using publicly available data sets.