Multidimensional Data Mining by Means of Randomly Travelling Hyper-Ellipsoids

The present study presents a new approach to automatic data clustering and classification problems in large and complex databases and, at the same time, derives specific types of explicit rules describing each cluster. The method works well in both sparse and dense multidimensional data spaces. The members of the data space can be of the same nature or represent different classes. A number of N-dimensional ellipsoids are used for enclosing the data clouds. Due to the geometry of an ellipsoid and its free rotation in space the detection of clusters becomes very efficient. The method is based on genetic algorithms that are used for the optimization of location, orientation and geometric characteristics of the hyper-ellipsoids. The proposed approach can serve as a basis for the development of general knowledge systems for discovering hidden knowledge and unexpected patterns and rules in various large databases.




References:
[1] S.S.R. Abidi, K.M. Hoe and A. Goh, "Analyzing Data Clusters: A Rough
Set Approach to Extract Cluster-defining Symbolic Rules." Lecture
Notes in Computer Science 2189: Advances in Intelligent Data Analysis.
Fourth International Conference (IDA-01), Cascais, Portugal, 2001.
[2] P. Adriaans and D. Zantinge, Data Mining, Addison-Wesley, England,
1997.
[3] S. Audic and J.M. Claverie, "Detection of eukaryotic promoters using
Markov transition matrices", Computer Chemistry, vol.21, no.4,
pp. 223-227, 1997.
[4] V.B. Bajic, Sin Lam Tan, Yutaka Suzuki and Sumio Sugano, "Promoter
prediction analysis on the whole human genome", Nature Biotechnology,
vol.22, pp. 1467-1473, 2004.
[5] M.J.A. Berry and G. Linoff, Data Mining Techniques. For Marketing,
Sales and Customer Support, John Wiley & Sons, Inc., 1997.
[6] P. Bucher, "Weight matrix descriptions of four eukaryotic RNA polymerase
II promoter elements derived from 502 unrelated promoter
sequences ", Journal of Molecular Biology, vol.212, pp. 563-578, 1990.
[7] J.W. Fickett and A.G. Hatzigeorgiou, "Eukaryotic promoter recognition",
Genom Research, vol.7, no.9, pp. 861-878, 1997.
[8] D.B. Fogel, Evolutionary Computation (Second edition), IEEE Press,
New York, 2000.
[9] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Company,
N.Y., 1959.
[10] D.E. Goldberg, Genetic Algorithms in Search, Optimisation, and Machine
Learning, Addison-Wesley, Reading, MA, 1989.
[11] J.A. Hartigan, Clustering Algorithms, John Wiley & Sons, 1975.
[12] J.H. Holland, Adaptation in Natural and Artificial Systems, The University
of Michigan Press, Ann Arbor, MI, 1976.
[13] E.R. Hruschka and N.F. Ebecken, "A Clustering Genetic Algorithm
for Extracting Rules from Supervised Neural Network Models in Data
Mining Tasks", Int. Journal of Computers, Systems and Signals, vol.1,
no.1, pp. 17-29, 2000.
[14] A.K. Jain, M.N. Murty and P.J. Flynn, "Data Clustering: A Review",
ACM Computing Surveys, vol.31, no.3, pp. 264-323, 1999.
[15] N. Kasabov, Evolving Neural Networks, MIT Press, 1996.
[16] S.Y. Kung, Digital Neural Networks, PTR Prentice Hall, Engelwood
Cliffs, NJ, 1993.
[17] A.G. Pedersen, P. Baldi, Y. Chauvin and S. Brunak, "The biology of
eukaryotic promoter prediction - a review", Computers and Chemistry,
vol.23, pp. 191-207, 1999.
[18] P.Y. Tabakov and V.B. Baji'c, "Genetic Algorithms and Extraction of
Rules for Detection of Short DNA Motifs", Int. Journal of Computers,
Systems and Signals, vol. 1, no. 1, pp. 106-117, 2000.
[19] Xiaowo Wang, Zhenyu Xuan, Xiaoyue Zhao, Yanda Li and Michael
Q. Zhang, " High-resolution human core-promoter prediction with
CoreBoost HM", Genome Research, vol.19, pp. 266-275, 2009.