Abstract: Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.
Abstract: Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.
Abstract: Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.
Abstract: This study statistically models the surface f0 contour and the underlying pitch target of a well-studied third sandhi tone of Mandarin Chinese. Although the growth curve analysis on the surface f0 contours indicates non-neutralization of this sandhi tone (T3) and the base T2, their underlying pitch targets do show neutralization. These results in Mandarin are also consistent with the perception of native speakers, where they cannot distinguish the third T3 from the base T2, compensating contextual variation. It is possible to use the proposed statistical procedure of testing underlying pitch targets to verify tone sandhi processes in other tonal languages.
Abstract: Understanding the statistics of non-isotropic scattering multipath channels that fade randomly with respect to time, frequency, and space in a mobile environment is very crucial for the accurate detection of received signals in wireless and cellular communication systems. In this paper, we derive stochastic models for the probability density function (PDF) of the shift in the carrier frequency caused by the Doppler Effect on the received illuminating signal in the presence of a dominant line of sight. Our derivation is based on a generalized Clarke’s and a two-wave partially developed scattering models, where the statistical distribution of the frequency shift is shown to be consistent with the power spectral density of the Doppler shifted signal.
Abstract: Fading noise degrades the performance of cellular
communication, most notably in femto- and pico-cells in 3G and 4G
systems. When the wireless channel consists of a small number of
scattering paths, the statistics of fading noise is not analytically
tractable and poses a serious challenge to developing closed
canonical forms that can be analysed and used in the design of
efficient and optimal receivers. In this context, noise is multiplicative
and is referred to as stochastically local fading. In many analytical
investigation of multiplicative noise, the exponential or Gamma
statistics are invoked. More recent advances by the author of this
paper utilized a Poisson modulated-weighted generalized Laguerre
polynomials with controlling parameters and uncorrelated noise
assumptions. In this paper, we investigate the statistics of multidiversity
stochastically local area fading channel when the channel
consists of randomly distributed Rayleigh and Rician scattering
centers with a coherent Nakagami-distributed line of sight component
and an underlying doubly stochastic Poisson process driven by a
lognormal intensity. These combined statistics form a unifying triply
stochastic filtered marked Poisson point process model.
Abstract: Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.
Abstract: This paper analyzes the conceptual framework of three
statistical methods, multiple regression, path analysis, and structural
equation models. When establishing research model of the statistical
modeling of complex social phenomenon, it is important to know the
strengths and limitations of three statistical models. This study
explored the character, strength, and limitation of each modeling and
suggested some strategies for accurate explaining or predicting the
causal relationships among variables. Especially, on the studying of
depression or mental health, the common mistakes of research
modeling were discussed.
Abstract: The hydrolysis of lactose using β-galactosidase is one of the most promising biotechnological applications, which has wide range of potential applications in food processing industries. However, due to intracellular location of the yeast enzyme, and expensive extraction methods, the industrial applications of enzymatic hydrolysis processes are being hampered. The use of permeabilization technique can help to overcome the problems associated with enzyme extraction and purification of yeast cells and to develop the economically viable process for the utilization of whole cell biocatalysts in food industries. In the present investigation, standardization of permeabilization process of novel yeast isolate was carried out using a statistical model approach known as Response Surface Methodology (RSM) to achieve maximal b-galactosidase activity. The optimum operating conditions for permeabilization process for optimal β-galactosidase activity obtained by RSM were 1:1 ratio of toluene (25%, v/v) and ethanol (50%, v/v), 25.0 oC temperature and treatment time of 12 min, which displayed enzyme activity of 1.71 IU /mg DW.
Abstract: Industries using conventional fossil fuels have an
interest in better understanding the mechanism of particulate
formation during combustion since such is responsible for emission
of undesired inorganic elements that directly impact the atmospheric
pollution level. Fine and ultrafine particulates have tendency to
escape the flue gas cleaning devices to the atmosphere. They also
preferentially collect on surfaces in power systems resulting in
ascending in corrosion inclination, descending in the heat transfer
thermal unit, and severe impact on human health. This adverseness
manifests particularly in the regions of world where coal is the
dominated source of energy for consumption.
This study highlights the behavior of calcium transformation as
mineral grains verses organically associated inorganic components
during pulverized coal combustion. The influence of existing type of
calcium on the coarse, fine and ultrafine mode formation mechanisms
is also presented. The impact of two sub-bituminous coals on particle
size and calcium composition evolution during combustion is to be
assessed. Three mixed blends named Blends 1, 2, and 3 are selected
according to the ration of coal A to coal B by weight. Calcium
percentage in original coal increases as going from Blend 1 to 3.
A mathematical model and a new approach of describing
constituent distribution are proposed. Analysis of experiments of
calcium distribution in ash is also modeled using Poisson distribution.
A novel parameter, called elemental index λ, is introduced as a
measuring factor of element distribution.
Results show that calcium in ash that originally in coal as mineral
grains has index of 17, whereas organically associated calcium
transformed to fly ash shown to be best described when elemental
index λ is 7.
As an alkaline-earth element, calcium is considered the
fundamental element responsible for boiler deficiency since it is the
major player in the mechanism of ash slagging process. The
mechanism of particle size distribution and mineral species of ash
particles are presented using CCSEM and size-segregated ash
characteristics. Conclusions are drawn from the analysis of
pulverized coal ash generated from a utility-scale boiler.
Abstract: We compare three categorical data clustering
algorithms with respect to the problem of classifying cultural data
related to the aesthetic judgment of comics artists. Such a
classification is very important in Comics Art theory since the
determination of any classes of similarities in such kind of data will
provide to art-historians very fruitful information of Comics Art-s
evolution. To establish this, we use a categorical data set and we
study it by employing three categorical data clustering algorithms.
The performances of these algorithms are compared each other,
while interpretations of the clustering results are also given.
Abstract: One of the essential sectors of Myanmar economy is
agriculture which is sensitive to climate variation. The most
important climatic element which impacts on agriculture sector is
rainfall. Thus rainfall prediction becomes an important issue in
agriculture country. Multi variables polynomial regression (MPR)
provides an effective way to describe complex nonlinear input output
relationships so that an outcome variable can be predicted from the
other or others. In this paper, the modeling of monthly rainfall
prediction over Myanmar is described in detail by applying the
polynomial regression equation. The proposed model results are
compared to the results produced by multiple linear regression model
(MLR). Experiments indicate that the prediction model based on
MPR has higher accuracy than using MLR.
Abstract: The problem of generation expansion planning (GEP)
has been extensively studied for many years. This paper presents
three topics in GEP as follow: statistical model, models for
generation expansion, and expansion problem. In the topic of
statistical model, the main stages of the statistical modeling are
briefly explained. Some works on models for GEP are reviewed in
the topic of models for generation expansion. Finally for the topic of
expansion problem, the major issues in the development of a longterm
expansion plan are summarized.
Abstract: The fault detection and diagnosis of complicated
production processes is one of essential tasks needed to run the process
safely with good final product quality. Unexpected events occurred in
the process may have a serious impact on the process. In this work,
triangular representation of process measurement data obtained in an
on-line basis is evaluated using simulation process. The effect of using
linear and nonlinear reduced spaces is also tested. Their diagnosis
performance was demonstrated using multivariate fault data. It has
shown that the nonlinear technique based diagnosis method produced
more reliable results and outperforms linear method. The use of
appropriate reduced space yielded better diagnosis performance. The
presented diagnosis framework is different from existing ones in that it
attempts to extract the fault pattern in the reduced space, not in the
original process variable space. The use of reduced model space helps
to mitigate the sensitivity of the fault pattern to noise.
Abstract: Rutting is one of the major load-related distresses in airport flexible pavements. Rutting in paving materials develop gradually with an increasing number of load applications, usually appearing as longitudinal depressions in the wheel paths and it may be accompanied by small upheavals to the sides. Significant research has been conducted to determine the factors which affect rutting and how they can be controlled. Using the experimental design concepts, a series of tests can be conducted while varying levels of different parameters, which could be the cause for rutting in airport flexible pavements. If proper experimental design is done, the results obtained from these tests can give a better insight into the causes of rutting and the presence of interactions and synergisms among the system variables which have influence on rutting. Although traditionally, laboratory experiments are conducted in a controlled fashion to understand the statistical interaction of variables in such situations, this study is an attempt to identify the critical system variables influencing airport flexible pavement rut depth from a statistical DoE perspective using real field data from a full-scale test facility. The test results do strongly indicate that the response (rut depth) has too much noise in it and it would not allow determination of a good model. From a statistical DoE perspective, two major changes proposed for this experiment are: (1) actual replication of the tests is definitely required, (2) nuisance variables need to be identified and blocked properly. Further investigation is necessary to determine possible sources of noise in the experiment.
Abstract: The article investigates how 14- to 15- year-olds build informal conceptions of inferential statistics as they engage in a modelling process and build their own computer simulations with dynamic statistical software. This study proposes four primary phases of informal inferential reasoning for the students in the statistical modeling and simulation process. Findings show shifts in the conceptual structures across the four phases and point to the potential of all of these phases for fostering the development of students- robust knowledge of the logic of inference when using computer based simulations to model and investigate statistical questions.