PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Optimization of Slider Crank Mechanism Using Design of Experiments and Multi-Linear Regression

Crank shaft length, connecting rod length, crank angle, engine rpm, cylinder bore, mass of piston and compression ratio are the inputs that can control the performance of the slider crank mechanism and then its efficiency. Several combinations of these seven inputs are used and compared. The throughput engine torque predicted by the simulation is analyzed through two different regression models, with and without interaction terms, developed according to multi-linear regression using LU decomposition to solve system of algebraic equations. These models are validated. A regression model in seven inputs including their interaction terms lowered the polynomial degree from 3rd degree to 1st degree and suggested valid predictions and stable explanations.

Statistical Modeling of Mandarin Tone Sandhi: Neutralization of Underlying Pitch Targets

This study statistically models the surface f0 contour and the underlying pitch target of a well-studied third sandhi tone of Mandarin Chinese. Although the growth curve analysis on the surface f0 contours indicates non-neutralization of this sandhi tone (T3) and the base T2, their underlying pitch targets do show neutralization. These results in Mandarin are also consistent with the perception of native speakers, where they cannot distinguish the third T3 from the base T2, compensating contextual variation. It is possible to use the proposed statistical procedure of testing underlying pitch targets to verify tone sandhi processes in other tonal languages.

Statistical Modeling of Mobile Fading Channels Based on Triply Stochastic Filtered Marked Poisson Point Processes

Understanding the statistics of non-isotropic scattering multipath channels that fade randomly with respect to time, frequency, and space in a mobile environment is very crucial for the accurate detection of received signals in wireless and cellular communication systems. In this paper, we derive stochastic models for the probability density function (PDF) of the shift in the carrier frequency caused by the Doppler Effect on the received illuminating signal in the presence of a dominant line of sight. Our derivation is based on a generalized Clarke’s and a two-wave partially developed scattering models, where the statistical distribution of the frequency shift is shown to be consistent with the power spectral density of the Doppler shifted signal.

Statistical Modeling of Local Area Fading Channels Based on Triply Stochastic Filtered Marked Poisson Point Processes

Fading noise degrades the performance of cellular communication, most notably in femto- and pico-cells in 3G and 4G systems. When the wireless channel consists of a small number of scattering paths, the statistics of fading noise is not analytically tractable and poses a serious challenge to developing closed canonical forms that can be analysed and used in the design of efficient and optimal receivers. In this context, noise is multiplicative and is referred to as stochastically local fading. In many analytical investigation of multiplicative noise, the exponential or Gamma statistics are invoked. More recent advances by the author of this paper utilized a Poisson modulated-weighted generalized Laguerre polynomials with controlling parameters and uncorrelated noise assumptions. In this paper, we investigate the statistics of multidiversity stochastically local area fading channel when the channel consists of randomly distributed Rayleigh and Rician scattering centers with a coherent Nakagami-distributed line of sight component and an underlying doubly stochastic Poisson process driven by a lognormal intensity. These combined statistics form a unifying triply stochastic filtered marked Poisson point process model.

An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Statistical Modeling for Permeabilization of a Novel Yeast Isolate for β-Galactosidase Activity Using Organic Solvents

The hydrolysis of lactose using β-galactosidase is one of the most promising biotechnological applications, which has wide range of potential applications in food processing industries. However, due to intracellular location of the yeast enzyme, and expensive extraction methods, the industrial applications of enzymatic hydrolysis processes are being hampered. The use of permeabilization technique can help to overcome the problems associated with enzyme extraction and purification of yeast cells and to develop the economically viable process for the utilization of whole cell biocatalysts in food industries. In the present investigation, standardization of permeabilization process of novel yeast isolate was carried out using a statistical model approach known as Response Surface Methodology (RSM) to achieve maximal b-galactosidase activity. The optimum operating conditions for permeabilization process for optimal β-galactosidase activity obtained by RSM were 1:1 ratio of toluene (25%, v/v) and ethanol (50%, v/v), 25.0 oC temperature and treatment time of 12 min, which displayed enzyme activity of 1.71 IU /mg DW.

Statistical Modeling of Constituents in Ash Evolved From Pulverized Coal Combustion

Industries using conventional fossil fuels have an  interest in better understanding the mechanism of particulate  formation during combustion since such is responsible for emission  of undesired inorganic elements that directly impact the atmospheric  pollution level. Fine and ultrafine particulates have tendency to  escape the flue gas cleaning devices to the atmosphere. They also  preferentially collect on surfaces in power systems resulting in  ascending in corrosion inclination, descending in the heat transfer  thermal unit, and severe impact on human health. This adverseness  manifests particularly in the regions of world where coal is the  dominated source of energy for consumption.  This study highlights the behavior of calcium transformation as  mineral grains verses organically associated inorganic components  during pulverized coal combustion. The influence of existing type of  calcium on the coarse, fine and ultrafine mode formation mechanisms  is also presented. The impact of two sub-bituminous coals on particle  size and calcium composition evolution during combustion is to be  assessed. Three mixed blends named Blends 1, 2, and 3 are selected  according to the ration of coal A to coal B by weight. Calcium  percentage in original coal increases as going from Blend 1 to 3.  A mathematical model and a new approach of describing  constituent distribution are proposed. Analysis of experiments of  calcium distribution in ash is also modeled using Poisson distribution.  A novel parameter, called elemental index λ, is introduced as a  measuring factor of element distribution.  Results show that calcium in ash that originally in coal as mineral  grains has index of 17, whereas organically associated calcium  transformed to fly ash shown to be best described when elemental  index λ is 7.  As an alkaline-earth element, calcium is considered the  fundamental element responsible for boiler deficiency since it is the  major player in the mechanism of ash slagging process. The  mechanism of particle size distribution and mineral species of ash  particles are presented using CCSEM and size-segregated ash  characteristics. Conclusions are drawn from the analysis of  pulverized coal ash generated from a utility-scale boiler.  

Cluster Analysis for the Statistical Modeling of Aesthetic Judgment Data Related to Comics Artists

We compare three categorical data clustering algorithms with respect to the problem of classifying cultural data related to the aesthetic judgment of comics artists. Such a classification is very important in Comics Art theory since the determination of any classes of similarities in such kind of data will provide to art-historians very fruitful information of Comics Art-s evolution. To establish this, we use a categorical data set and we study it by employing three categorical data clustering algorithms. The performances of these algorithms are compared each other, while interpretations of the clustering results are also given.

Empirical Statistical Modeling of Rainfall Prediction over Myanmar

One of the essential sectors of Myanmar economy is agriculture which is sensitive to climate variation. The most important climatic element which impacts on agriculture sector is rainfall. Thus rainfall prediction becomes an important issue in agriculture country. Multi variables polynomial regression (MPR) provides an effective way to describe complex nonlinear input output relationships so that an outcome variable can be predicted from the other or others. In this paper, the modeling of monthly rainfall prediction over Myanmar is described in detail by applying the polynomial regression equation. The proposed model results are compared to the results produced by multiple linear regression model (MLR). Experiments indicate that the prediction model based on MPR has higher accuracy than using MLR.

Generation Expansion Planning Strategies on Power System: A Review

The problem of generation expansion planning (GEP) has been extensively studied for many years. This paper presents three topics in GEP as follow: statistical model, models for generation expansion, and expansion problem. In the topic of statistical model, the main stages of the statistical modeling are briefly explained. Some works on models for GEP are reviewed in the topic of models for generation expansion. Finally for the topic of expansion problem, the major issues in the development of a longterm expansion plan are summarized.

Diagnosis of Multivariate Process via Nonlinear Kernel Method Combined with Qualitative Representation of Fault Patterns

The fault detection and diagnosis of complicated production processes is one of essential tasks needed to run the process safely with good final product quality. Unexpected events occurred in the process may have a serious impact on the process. In this work, triangular representation of process measurement data obtained in an on-line basis is evaluated using simulation process. The effect of using linear and nonlinear reduced spaces is also tested. Their diagnosis performance was demonstrated using multivariate fault data. It has shown that the nonlinear technique based diagnosis method produced more reliable results and outperforms linear method. The use of appropriate reduced space yielded better diagnosis performance. The presented diagnosis framework is different from existing ones in that it attempts to extract the fault pattern in the reduced space, not in the original process variable space. The use of reduced model space helps to mitigate the sensitivity of the fault pattern to noise.

Statistical Modeling of Accelerated Pavement Failure Using Response Surface Methodology

Rutting is one of the major load-related distresses in airport flexible pavements. Rutting in paving materials develop gradually with an increasing number of load applications, usually appearing as longitudinal depressions in the wheel paths and it may be accompanied by small upheavals to the sides. Significant research has been conducted to determine the factors which affect rutting and how they can be controlled. Using the experimental design concepts, a series of tests can be conducted while varying levels of different parameters, which could be the cause for rutting in airport flexible pavements. If proper experimental design is done, the results obtained from these tests can give a better insight into the causes of rutting and the presence of interactions and synergisms among the system variables which have influence on rutting. Although traditionally, laboratory experiments are conducted in a controlled fashion to understand the statistical interaction of variables in such situations, this study is an attempt to identify the critical system variables influencing airport flexible pavement rut depth from a statistical DoE perspective using real field data from a full-scale test facility. The test results do strongly indicate that the response (rut depth) has too much noise in it and it would not allow determination of a good model. From a statistical DoE perspective, two major changes proposed for this experiment are: (1) actual replication of the tests is definitely required, (2) nuisance variables need to be identified and blocked properly. Further investigation is necessary to determine possible sources of noise in the experiment.

Informal Inferential Reasoning Using a Modelling Approach within a Computer-Based Simulation

The article investigates how 14- to 15- year-olds build informal conceptions of inferential statistics as they engage in a modelling process and build their own computer simulations with dynamic statistical software. This study proposes four primary phases of informal inferential reasoning for the students in the statistical modeling and simulation process. Findings show shifts in the conceptual structures across the four phases and point to the potential of all of these phases for fostering the development of students- robust knowledge of the logic of inference when using computer based simulations to model and investigate statistical questions.