A Comparison of Air Pollution in Developed and Developing Cities: A Case Study of London and Beijing

With the rapid development of industrialization, countries in different stages of development in the world have gradually begun to pay attention to the impact of air pollution on health and the environment. Air control in developed countries is an effective reference for air control in developing countries. Artificial intelligence and other technologies also play a positive role in the prediction of air pollution. By comparing the annual changes of pollution in London and Beijing, this paper concludes that the pollution in developed cities is relatively low and stable, while the pollution in Beijing is relatively heavy and unstable, but is clearly improving. In addition, by analyzing the changes of major pollutants in Beijing in the past eight years, it is concluded that all pollutants except O3 show a significant downward trend. In addition, all pollutants except O3 have certain correlation. For example, PM10 and PM2.5 have the greatest influence on air quality index (AQI). Python, which is commonly used by artificial intelligence, is used as the main software to establish two models, support vector machine (SVM) and linear regression. By comparing the two models under the same conditions, it is concluded that SVM has higher accuracy in pollution prediction. The results of this study provide valuable reference for pollution control and prediction in developing countries.

The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Machine Learning Based Approach for Measuring Promotion Effectiveness in Multiple Parallel Promotions’ Scenarios

Promotion is a key element in the retail business. Thus, analysis of promotions to quantify their effectiveness in terms of Revenue and/or Margin is an essential activity in the retail industry. However, measuring the sales/revenue uplift is based on estimations, as the actual sales/revenue without the promotion is not present. Further, the presence of Halo and Cannibalization in a multiple parallel promotions’ scenario complicates the problem. Calculating Baseline by considering inter-brand/competitor items or using Halo and Cannibalization's impact on Revenue calculations by considering Baseline as an interpretation of items’ unit sales in neighboring nonpromotional weeks individually may not capture the overall Revenue uplift in the case of multiple parallel promotions. Hence, this paper proposes a Machine Learning based method for calculating the Revenue uplift by considering the Halo and Cannibalization impact on the Baseline and the Revenue. In the first section of the proposed methodology, Baseline of an item is calculated by incorporating the impact of the promotions on its related items. In the later section, the Revenue of an item is calculated by considering both Halo and Cannibalization impacts. Hence, this methodology enables correct calculation of the overall Revenue uplift due a given promotion.

The Influence of Interest, Beliefs, and Identity with Mathematics on Achievement

This study investigated factors that influence mathematics achievement based on a sample of ninth-grade students (N  =  21,444) from the High School Longitudinal Study of 2009 (HSLS09). Key aspects studied included efficacy in mathematics, interest and enjoyment of mathematics, identity with mathematics and future utility beliefs and how these influence mathematics achievement. The predictability of mathematics achievement based on these factors was assessed using correlation coefficients and multiple linear regression. Spearman rank correlations and multiple regression analyses indicated positive and statistically significant relationships between the explanatory variables: mathematics efficacy, identity with mathematics, interest in and future utility beliefs with the response variable, achievement in mathematics.

Blood Glucose Level Measurement from Breath Analysis

The constant monitoring of blood glucose level is necessary for maintaining health of patients and to alert medical specialists to take preemptive measures before the onset of any complication as a result of diabetes. The current clinical monitoring of blood glucose uses invasive methods repeatedly which are uncomfortable and may result in infections in diabetic patients. Several attempts have been made to develop non-invasive techniques for blood glucose measurement. In this regard, the existing methods are not reliable and are less accurate. Other approaches claiming high accuracy have not been tested on extended dataset, and thus, results are not statistically significant. It is a well-known fact that acetone concentration in breath has a direct relation with blood glucose level. In this paper, we have developed the first of its kind, reliable and high accuracy breath analyzer for non-invasive blood glucose measurement. The acetone concentration in breath was measured using MQ 138 sensor in the samples collected from local hospitals in Pakistan involving one hundred patients. The blood glucose levels of these patients are determined using conventional invasive clinical method. We propose a linear regression classifier that is trained to map breath acetone level to the collected blood glucose level achieving high accuracy.

Using Simulation Modeling Approach to Predict USMLE Steps 1 and 2 Performances

The prediction models for the United States Medical Licensure Examination (USMLE) Steps 1 and 2 performances were constructed by the Monte Carlo simulation modeling approach via linear regression. The purpose of this study was to build robust simulation models to accurately identify the most important predictors and yield the valid range estimations of the Steps 1 and 2 scores. The application of simulation modeling approach was deemed an effective way in predicting student performances on licensure examinations. Also, sensitivity analysis (a/k/a what-if analysis) in the simulation models was used to predict the magnitudes of Steps 1 and 2 affected by changes in the National Board of Medical Examiners (NBME) Basic Science Subject Board scores. In addition, the study results indicated that the Medical College Admission Test (MCAT) Verbal Reasoning score and Step 1 score were significant predictors of the Step 2 performance. Hence, institutions could screen qualified student applicants for interviews and document the effectiveness of basic science education program based on the simulation results.

Electricity Load Modeling: An Application to Italian Market

Forecasting electricity load plays a crucial role regards decision making and planning for economical purposes. Besides, in the light of the recent privatization and deregulation of the power industry, the forecasting of future electricity load turned out to be a very challenging problem. Empirical data about electricity load highlights a clear seasonal behavior (higher load during the winter season), which is partly due to climatic effects. We also emphasize the presence of load periodicity at a weekly basis (electricity load is usually lower on weekends or holidays) and at daily basis (electricity load is clearly influenced by the hour). Finally, a long-term trend may depend on the general economic situation (for example, industrial production affects electricity load). All these features must be captured by the model. The purpose of this paper is then to build an hourly electricity load model. The deterministic component of the model requires non-linear regression and Fourier series while we will investigate the stochastic component through econometrical tools. The calibration of the parameters’ model will be performed by using data coming from the Italian market in a 6 year period (2007- 2012). Then, we will perform a Monte Carlo simulation in order to compare the simulated data respect to the real data (both in-sample and out-of-sample inspection). The reliability of the model will be deduced thanks to standard tests which highlight a good fitting of the simulated values.

Multi-Linear Regression Based Prediction of Mass Transfer by Multiple Plunging Jets

The paper aims to compare the performance of vertical and inclined multiple plunging jets and to model and predict their mass transfer capacity by multi-linear regression based approach. The multiple vertical plunging jets have jet impact angle of θ = 90O; whereas, multiple inclined plunging jets have jet impact angle of θ = 60O. The results of the study suggests that mass transfer is higher for multiple jets, and inclined multiple plunging jets have up to 1.6 times higher mass transfer than vertical multiple plunging jets under similar conditions. The derived relationship, based on multi-linear regression approach, has successfully predicted the volumetric mass transfer coefficient (KLa) from operational parameters of multiple plunging jets with a correlation coefficient of 0.973, root mean square error of 0.002 and coefficient of determination of 0.946. The results suggests that predicted overall mass transfer coefficient is in good agreement with actual experimental values; thereby, suggesting the utility of derived relationship based on multi-linear regression based approach and can be successfully employed in modeling mass transfer by multiple plunging jets.

A Comparison of the Sum of Squares in Linear and Partial Linear Regression Models

In this paper, estimation of the linear regression model is made by ordinary least squares method and the partially linear regression model is estimated by penalized least squares method using smoothing spline. Then, it is investigated that differences and similarity in the sum of squares related for linear regression and partial linear regression models (semi-parametric regression models). It is denoted that the sum of squares in linear regression is reduced to sum of squares in partial linear regression models. Furthermore, we indicated that various sums of squares in the linear regression are similar to different deviance statements in partial linear regression. In addition to, coefficient of the determination derived in linear regression model is easily generalized to coefficient of the determination of the partial linear regression model. For this aim, it is made two different applications. A simulated and a real data set are considered to prove the claim mentioned here. In this way, this study is supported with a simulation and a real data example.

On the outlier Detection in Nonlinear Regression

The detection of outliers is very essential because of their responsibility for producing huge interpretative problem in linear as well as in nonlinear regression analysis. Much work has been accomplished on the identification of outlier in linear regression, but not in nonlinear regression. In this article we propose several outlier detection techniques for nonlinear regression. The main idea is to use the linear approximation of a nonlinear model and consider the gradient as the design matrix. Subsequently, the detection techniques are formulated. Six detection measures are developed that combined with three estimation techniques such as the Least-Squares, M and MM-estimators. The study shows that among the six measures, only the studentized residual and Cook Distance which combined with the MM estimator, consistently capable of identifying the correct outliers.

Can Physical Activity and Dietary Fat Intake Influence Body Mass Index in a Cross-sectional Correlational Design?

The purpose of this study was to determine the influence of physical activity and dietary fat intake on Body Mass Index (BMI) of lecturers within a higher learning institutionalized setting. The study adopted a Cross-sectional Correlational Design and included 120 lecturers selected proportionately by simple random sampling techniques from a population of 600 lecturers. Data was collected using questionnaires, which had sections including physical activity checklist adopted from the international physical activity questionnaire (IPAQ), 24-hour food recall, anthropometric measurements mainly weight and height. Analysis involved the use of bivariate correlations and linear regression. A significant inverse association was registered between BMI and duration (in minutes) spent doing moderate intense physical activity per day (r=-0.322, p

Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists

Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.