The Robust Clustering with Reduction Dimension

A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paper

Trimmed Mean as an Adaptive Robust Estimator of a Location Parameter for Weibull Distribution

One of the purposes of the robust method of estimation is to reduce the influence of outliers in the data, on the estimates. The outliers arise from gross errors or contamination from distributions with long tails. The trimmed mean is a robust estimate. This means that it is not sensitive to violation of distributional assumptions of the data. It is called an adaptive estimate when the trimming proportion is determined from the data rather than being fixed a “priori-. The main objective of this study is to find out the robustness properties of the adaptive trimmed means in terms of efficiency, high breakdown point and influence function. Specifically, it seeks to find out the magnitude of the trimming proportion of the adaptive trimmed mean which will yield efficient and robust estimates of the parameter for data which follow a modified Weibull distribution with parameter λ = 1/2 , where the trimming proportion is determined by a ratio of two trimmed means defined as the tail length. Secondly, the asymptotic properties of the tail length and the trimmed means are also investigated. Finally, a comparison is made on the efficiency of the adaptive trimmed means in terms of the standard deviation for the trimming proportions and when these were fixed a “priori". The asymptotic tail lengths defined as the ratio of two trimmed means and the asymptotic variances were computed by using the formulas derived. While the values of the standard deviations for the derived tail lengths for data of size 40 simulated from a Weibull distribution were computed for 100 iterations using a computer program written in Pascal language. The findings of the study revealed that the tail lengths of the Weibull distribution increase in magnitudes as the trimming proportions increase, the measure of the tail length and the adaptive trimmed mean are asymptotically independent as the number of observations n becomes very large or approaching infinity, the tail length is asymptotically distributed as the ratio of two independent normal random variables, and the asymptotic variances decrease as the trimming proportions increase. The simulation study revealed empirically that the standard error of the adaptive trimmed mean using the ratio of tail lengths is relatively smaller for different values of trimming proportions than its counterpart when the trimming proportions were fixed a 'priori'.