Improvement of Central Composite Design in Modeling and Optimization of Simulation Experiments

Simulation modeling can be used to solve real world problems. It provides an understanding of a complex system. To develop a simplified model of process simulation, a suitable experimental design is required to be able to capture surface characteristics. This paper presents the experimental design and algorithm used to model the process simulation for optimization problem. The CO2 liquefaction based on external refrigeration with two refrigeration circuits was used as a simulation case study. Latin Hypercube Sampling (LHS) was purposed to combine with existing Central Composite Design (CCD) samples to improve the performance of CCD in generating the second order model of the system. The second order model was then used as the objective function of the optimization problem. The results showed that adding LHS samples to CCD samples can help capture surface curvature characteristics. Suitable number of LHS sample points should be considered in order to get an accurate nonlinear model with minimum number of simulation experiments.

Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Coverage Probability of Confidence Intervals for the Normal Mean and Variance with Restricted Parameter Space

Recent articles have addressed the problem to construct the confidence intervals for the mean of a normal distribution where the parameter space is restricted, see for example Wang [Confidence intervals for the mean of a normal distribution with restricted parameter space. Journal of Statistical Computation and Simulation, Vol. 78, No. 9, 2008, 829–841.], we derived, in this paper, analytic expressions of the coverage probability and the expected length of confidence interval for the normal mean when the whole parameter space is bounded. We also construct the confidence interval for the normal variance with restricted parameter for the first time and its coverage probability and expected length are also mathematically derived. As a result, one can use these criteria to assess the confidence interval for the normal mean and variance when the parameter space is restricted without the back up from simulation experiments.

Performance Evaluation of Data Transfer Protocol GridFTP for Grid Computing

In Grid computing, a data transfer protocol called GridFTP has been widely used for efficiently transferring a large volume of data. Currently, two versions of GridFTP protocols, GridFTP version 1 (GridFTP v1) and GridFTP version 2 (GridFTP v2), have been proposed in the GGF. GridFTP v2 supports several advanced features such as data streaming, dynamic resource allocation, and checksum transfer, by defining a transfer mode called X-block mode. However, in the literature, effectiveness of GridFTP v2 has not been fully investigated. In this paper, we therefore quantitatively evaluate performance of GridFTP v1 and GridFTP v2 using mathematical analysis and simulation experiments. We reveal the performance limitation of GridFTP v1, and quantitatively show effectiveness of GridFTP v2. Through several numerical examples, we show that by utilizing the data streaming feature, the average file transfer time of GridFTP v2 is significantly smaller than that of GridFTP v1.

On the Performance of Information Criteria in Latent Segment Models

Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which may support the selection of the correct number of segments we conduct a simulation study. In particular, this study is intended to determine which information criteria are more appropriate for mixture model selection when considering data sets with only categorical segmentation base variables. The generation of mixtures of multinomial data supports the proposed analysis. As a result, we establish a relationship between the level of measurement of segmentation variables and some (eleven) information criteria-s performance. The criterion AIC3 shows better performance (it indicates the correct number of the simulated segments- structure more often) when referring to mixtures of multinomial segmentation base variables.