Abstract: Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.
Abstract: Clustering unstructured text documents is an
important issue in data mining community and has a number of
applications such as document archive filtering, document
organization and topic detection and subject tracing. In the real
world, some of the already clustered documents may not be of
importance while new documents of more significance may evolve.
Most of the work done so far in clustering unstructured text
documents overlooks this aspect of clustering. This paper, addresses
this issue by using the Fading Function. The unstructured text
documents are clustered. And for each cluster a statistics structure
called Cluster Profile (CP) is implemented. The cluster profile
incorporates the Fading Function. This Fading Function keeps an
account of the time-dependent importance of the cluster. The work
proposes a novel algorithm Clustering n-ary Merge Algorithm
(CnMA) for unstructured text documents, that uses Cluster Profile
and Fading Function. Experimental results illustrating the
effectiveness of the proposed technique are also included.