Abstract: According to the statistics, the prevalence of congenital hearing loss in Taiwan is approximately six thousandths; furthermore, one thousandths of infants have severe hearing impairment. Hearing ability during infancy has significant impact in the development of children-s oral expressions, language maturity, cognitive performance, education ability and social behaviors in the future. Although most children born with hearing impairment have sensorineural hearing loss, almost every child more or less still retains some residual hearing. If provided with a hearing aid or cochlear implant (a bionic ear) timely in addition to hearing speech training, even severely hearing-impaired children can still learn to talk. On the other hand, those who failed to be diagnosed and thus unable to begin hearing and speech rehabilitations on a timely manner might lose an important opportunity to live a complete and healthy life. Eventually, the lack of hearing and speaking ability will affect the development of both mental and physical functions, intelligence, and social adaptability. Not only will this problem result in an irreparable regret to the hearing-impaired child for the life time, but also create a heavy burden for the family and society. Therefore, it is necessary to establish a set of computer-assisted predictive model that can accurately detect and help diagnose newborn hearing loss so that early interventions can be provided timely to eliminate waste of medical resources. This study uses information from the neonatal database of the case hospital as the subjects, adopting two different analysis methods of using support vector machine (SVM) for model predictions and using logistic regression to conduct factor screening prior to model predictions in SVM to examine the results. The results indicate that prediction accuracy is as high as 96.43% when the factors are screened and selected through logistic regression. Hence, the model constructed in this study will have real help in clinical diagnosis for the physicians and actually beneficial to the early interventions of newborn hearing impairment.
Abstract: In this paper is shown that the probability-statistic methods application, especially at the early stage of the aviation gas turbine engine (GTE) technical condition diagnosing, when the flight information has property of the fuzzy, limitation and uncertainty is unfounded. Hence is considered the efficiency of application of new technology Soft Computing at these diagnosing stages with the using of the Fuzzy Logic and Neural Networks methods. Training with high accuracy of fuzzy multiple linear and non-linear models (fuzzy regression equations) which received on the statistical fuzzy data basis is made. Thus for GTE technical condition more adequate model making are analysed dynamics of skewness and kurtosis coefficients' changes. Researches of skewness and kurtosis coefficients values- changes show that, distributions of GTE work parameters have fuzzy character. Hence consideration of fuzzy skewness and kurtosis coefficients is expedient. Investigation of the basic characteristics changes- dynamics of GTE work parameters allows to draw conclusion on necessity of the Fuzzy Statistical Analysis at preliminary identification of the engines' technical condition. Researches of correlation coefficients values- changes shows also on their fuzzy character. Therefore for models choice the application of the Fuzzy Correlation Analysis results is offered. For checking of models adequacy is considered the Fuzzy Multiple Correlation Coefficient of Fuzzy Multiple Regression. At the information sufficiency is offered to use recurrent algorithm of aviation GTE technical condition identification (Hard Computing technology is used) on measurements of input and output parameters of the multiple linear and non-linear generalised models at presence of noise measured (the new recursive Least Squares Method (LSM)). The developed GTE condition monitoring system provides stage-bystage estimation of engine technical conditions. As application of the given technique the estimation of the new operating aviation engine temperature condition was made.
Abstract: Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.
Abstract: The medical studies often require different methods
for parameters selection, as a second step of processing, after the
database-s designing and filling with information. One common
task is the selection of fields that act as risk factors using wellknown
methods, in order to find the most relevant risk factors and
to establish a possible hierarchy between them. Different methods
are available in this purpose, one of the most known being the
binary logistic regression. We will present the mathematical
principles of this method and a practical example of using it in the
analysis of the influence of 10 different psychiatric diagnostics
over 4 different types of offences (in a database made from 289
psychiatric patients involved in different types of offences).
Finally, we will make some observations about the relation
between the risk factors hierarchy established through binary
logistic regression and the individual risks, as well as the results of
Chi-squared test. We will show that the hierarchy built using the
binary logistic regression doesn-t agree with the direct order of risk
factors, even if it was naturally to assume this hypothesis as being
always true.
Abstract: Study of soil properties like field capacity (F.C.) and permanent wilting point (P.W.P.) play important roles in study of soil moisture retention curve. Although these parameters can be measured directly, their measurement is difficult and expensive. Pedotransfer functions (PTFs) provide an alternative by estimating soil parameters from more readily available soil data. In this investigation, 70 soil samples were collected from different horizons of 15 soil profiles located in the Ziaran region, Qazvin province, Iran. The data set was divided into two subsets for calibration (80%) and testing (20%) of the models and their normality were tested by Kolmogorov-Smirnov method. Both multivariate regression and artificial neural network (ANN) techniques were employed to develop the appropriate PTFs for predicting soil parameters using easily measurable characteristics of clay, silt, O.C, S.P, B.D and CaCO3. The performance of the multivariate regression and ANN models was evaluated using an independent test data set. In order to evaluate the models, root mean square error (RMSE) and R2 were used. The comparison of RSME for two mentioned models showed that the ANN model gives better estimates of F.C and P.W.P than the multivariate regression model. The value of RMSE and R2 derived by ANN model for F.C and P.W.P were (2.35, 0.77) and (2.83, 0.72), respectively. The corresponding values for multivariate regression model were (4.46, 0.68) and (5.21, 0.64), respectively. Results showed that ANN with five neurons in hidden layer had better performance in predicting soil properties than multivariate regression.
Abstract: This paper study about using of nonparametric
models for Gross National Product data in Turkey and Stanford heart
transplant data. It is discussed two nonparametric techniques called
smoothing spline and kernel regression. The main goal is to compare
the techniques used for prediction of the nonparametric regression
models. According to the results of numerical studies, it is concluded
that smoothing spline regression estimators are better than those of
the kernel regression.
Abstract: Kernel function, which allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, makes the Support Vector Machines (SVM) have been successfully applied in many fields, e.g. classification and regression. The importance of kernel has motivated many studies on its composition. It-s well-known that reproducing kernel (R.K) is a useful kernel function which possesses many properties, e.g. positive definiteness, reproducing property and composing complex R.K by simple operation. There are two popular ways to compute the R.K with explicit form. One is to construct and solve a specific differential equation with boundary value whose handicap is incapable of obtaining a unified form of R.K. The other is using a piecewise integral of the Green function associated with a differential operator L. The latter benefits the computation of a R.K with a unified explicit form and theoretical analysis, whereas there are relatively later studies and fewer practical computations. In this paper, a new algorithm for computing a R.K is presented. It can obtain the unified explicit form of R.K in general reproducing kernel Hilbert space. It avoids constructing and solving the complex differential equations manually and benefits an automatic, flexible and rigorous computation for more general RKHS. In order to validate that the R.K computed by the algorithm can be used in SVM well, some illustrative examples and a comparison between R.K and Gaussian kernel (RBF) in support vector regression are presented. The result shows that the performance of R.K is close or slightly superior to that of RBF.
Abstract: This paper develops the fiscal health index of 21 local
governments in Taiwan over the 1984 to 2010 period. A quantile
regression analysis was used to explore the extent that economic
variables, political budget cycles, and legislative checks and balances,
impact different quantiles of fiscal health index for a country over a
sample period of time. Our findings suggest that local governments at
the lower quantile are significantly benefited from political budget
cycles and the increase in central government revenues, while
legislative effective checks and balances and the increase in central
government expenditures have a significantly negative effect on local
fiscal health. When local governments are in the upper tail of the
distribution, legislative checks and balances and growth in
macroeconomics have significant and adverse effects on the fiscal
health of local governments. However, increases in central
government revenues have significant and positive effects on the
health status of local government in Taiwan.
Abstract: The aim of this paper is to identify the most suitable
model for churn prediction based on three different techniques. The
paper identifies the variables that affect churn in reverence of
customer complaints data and provides a comparative analysis of
neural networks, regression trees and regression in their capabilities
of predicting customer churn.
Abstract: Instead of traditional (nominal) classification we investigate
the subject of ordinal classification or ranking. An enhanced
method based on an ensemble of Support Vector Machines (SVM-s)
is proposed. Each binary classifier is trained with specific weights
for each object in the training data set. Experiments on benchmark
datasets and synthetic data indicate that the performance of our
approach is comparable to state of the art kernel methods for
ordinal regression. The ensemble method, which is straightforward
to implement, provides a very good sensitivity-specificity trade-off
for the highest and lowest rank.
Abstract: The approach of subset selection in polynomial
regression model building assumes that the chosen fixed full set of
predefined basis functions contains a subset that is sufficient to
describe the target relation sufficiently well. However, in most cases
the necessary set of basis functions is not known and needs to be
guessed – a potentially non-trivial (and long) trial and error process.
In our research we consider a potentially more efficient approach –
Adaptive Basis Function Construction (ABFC). It lets the model
building method itself construct the basis functions necessary for
creating a model of arbitrary complexity with adequate predictive
performance. However, there are two issues that to some extent
plague the methods of both the subset selection and the ABFC,
especially when working with relatively small data samples: the
selection bias and the selection instability. We try to correct these
issues by model post-evaluation using Cross-Validation and model
ensembling. To evaluate the proposed method, we empirically
compare it to ABFC methods without ensembling, to a widely used
method of subset selection, as well as to some other well-known
regression modeling methods, using publicly available data sets.
Abstract: In this paper a stochastic scenario-based model predictive control applied to molten salt storage systems in concentrated solar tower power plant is presented. The main goal of this study is to build up a tool to analyze current and expected future resources for evaluating the weekly power to be advertised on electricity secondary market. This tool will allow plant operator to maximize profits while hedging the impact on the system of stochastic variables such as resources or sunlight shortage.
Solving the problem first requires a mixed logic dynamic modeling of the plant. The two stochastic variables, respectively the sunlight incoming energy and electricity demands from secondary market, are modeled by least square regression. Robustness is achieved by drawing a certain number of random variables realizations and applying the most restrictive one to the system. This scenario approach control technique provides the plant operator a confidence interval containing a given percentage of possible stochastic variable realizations in such a way that robust control is always achieved within its bounds. The results obtained from many trajectory simulations show the existence of a ‘’reliable’’ interval, which experimentally confirms the algorithm robustness.
Abstract: Gold passbook is an investing tool that is especially
suitable for investors to do small investment in the solid gold. The gold
passbook has the lower risk than other ways investing in gold, but its
price is still affected by gold price. However, there are many factors
can cause influences on gold price. Therefore, building a model to
predict the price of gold passbook can both reduce the risk of
investment and increase the benefits. This study investigates the
important factors that influence the gold passbook price, and utilize
the Group Method of Data Handling (GMDH) to build the predictive
model. This method can not only obtain the significant variables but
also perform well in prediction. Finally, the significant variables of
gold passbook price, which can be predicted by GMDH, are US dollar
exchange rate, international petroleum price, unemployment rate,
whole sale price index, rediscount rate, foreign exchange reserves,
misery index, prosperity coincident index and industrial index.