Abstract: A clustering is process to identify a homogeneous
groups of object called as cluster. Clustering is one interesting topic
on data mining. A group or class behaves similarly characteristics.
This paper discusses a robust clustering process for data images with
two reduction dimension approaches; i.e. the two dimensional
principal component analysis (2DPCA) and principal component
analysis (PCA). A standard approach to overcome this problem is
dimension reduction, which transforms a high-dimensional data into
a lower-dimensional space with limited loss of information. One of
the most common forms of dimensionality reduction is the principal
components analysis (PCA). The 2DPCA is often called a variant of
principal component (PCA), the image matrices were directly treated
as 2D matrices; they do not need to be transformed into a vector so
that the covariance matrix of image can be constructed directly using
the original image matrices. The decomposed classical covariance
matrix is very sensitive to outlying observations. The objective of
paper is to compare the performance of robust minimizing vector
variance (MVV) in the two dimensional projection PCA (2DPCA)
and the PCA for clustering on an arbitrary data image when outliers
are hiden in the data set. The simulation aspects of robustness and
the illustration of clustering images are discussed in the end of
paper
Abstract: One of the purposes of the robust method of
estimation is to reduce the influence of outliers in the data, on the
estimates. The outliers arise from gross errors or contamination from
distributions with long tails. The trimmed mean is a robust estimate.
This means that it is not sensitive to violation of distributional
assumptions of the data. It is called an adaptive estimate when the
trimming proportion is determined from the data rather than being
fixed a “priori-.
The main objective of this study is to find out the robustness
properties of the adaptive trimmed means in terms of efficiency, high
breakdown point and influence function. Specifically, it seeks to find
out the magnitude of the trimming proportion of the adaptive
trimmed mean which will yield efficient and robust estimates of the
parameter for data which follow a modified Weibull distribution with
parameter λ = 1/2 , where the trimming proportion is determined by a
ratio of two trimmed means defined as the tail length. Secondly, the
asymptotic properties of the tail length and the trimmed means are
also investigated. Finally, a comparison is made on the efficiency of
the adaptive trimmed means in terms of the standard deviation for the
trimming proportions and when these were fixed a “priori".
The asymptotic tail lengths defined as the ratio of two trimmed
means and the asymptotic variances were computed by using the
formulas derived. While the values of the standard deviations for the
derived tail lengths for data of size 40 simulated from a Weibull
distribution were computed for 100 iterations using a computer
program written in Pascal language.
The findings of the study revealed that the tail lengths of the
Weibull distribution increase in magnitudes as the trimming
proportions increase, the measure of the tail length and the adaptive
trimmed mean are asymptotically independent as the number of
observations n becomes very large or approaching infinity, the tail
length is asymptotically distributed as the ratio of two independent
normal random variables, and the asymptotic variances decrease as
the trimming proportions increase. The simulation study revealed
empirically that the standard error of the adaptive trimmed mean
using the ratio of tail lengths is relatively smaller for different values
of trimming proportions than its counterpart when the trimming
proportions were fixed a 'priori'.