Growing Self Organising Map Based Exploratory Analysis of Text Data

Textual data plays an important role in the modern
world. The possibilities of applying data mining techniques to
uncover hidden information present in large volumes of text
collections is immense. The Growing Self Organizing Map (GSOM)
is a highly successful member of the Self Organising Map family
and has been used as a clustering and visualisation tool across wide
range of disciplines to discover hidden patterns present in the data.
A comprehensive analysis of the GSOM’s capabilities as a text
clustering and visualisation tool has so far not been published. These
functionalities, namely map visualisation capabilities, automatic
cluster identification and hierarchical clustering capabilities are
presented in this paper and are further demonstrated with experiments
on a benchmark text corpus.





References:
[1] A. Haug, "The implementation of enterprise content management
systems in smes,” Journal of Enterprise Information Management,
vol. 25, no. 4, pp. 349–372, 2012.
[2] D. Robb, "Text mining tools take on unstructured data,” Computerworld,
2004.
[3] T. Kohonen, "Self-organized formation of topologically correct feature
maps,” Biological Cybernetics, vol. 43, pp. 59–69, 1982.
[4] T. Kohonen, "Essentials of the self-organizing map,” Neural Networks,
vol. 37, pp. 52–65, 2013.
[5] D. Isa, V. Kallimani, and L. Lee, "Using the self organizing map for
clustering of text documents,” Expert Systems with Applications, vol. 36,
no. 5, pp. 9584–9591, 2009.
[6] D. Alahakoon, S. K. Halgamuge, and B. Srinivasan, "Dynamic
self-organizing maps with controlled growth for knowledge discovery,”
IEEE-NN, vol. 11, no. 3, p. 601, May 2000.
[7] M. Cao, A. Li, Q. Fang, E. Kaufmann, and B. J. Kroeger,
"Interconnected growing self-organizing maps for auditory and semantic
acquisition modeling,” Frontiers in psychology, vol. 5, 2014.
[8] C. D. Wijetunge, Z. Li, I. Saeed, J. Bowne, A. L. Hsu,
U. Roessner, A. Bacic, and S. K. Halgamuge, "Exploratory analysis
of high-throughput metabolomic data,” Metabolomics, vol. 9, no. 6, pp.
1311–1320, 2013.
[9] K. Wickramasinghe, D. Alahakoon, P. Schattner, and M. Georgeff,
"Self-organizing maps for translating health care knowledge: A case
study in diabetes management,” in AI 2011: Advances in Artificial
Intelligence. Springer, 2011, pp. 162–171.
[10] P. Lokuge and D. Alahakoon, "Improving the adaptability in automated
vessel scheduling in container ports using intelligent software agents,”
European Journal of Operational Research, vol. 177, no. 3, pp.
1985–2015, 2007.
[11] S. Matharage, O. Alahakoon, D. Alahakoon, S. Kapurubandara,
R. Nayyar, M. Mukherji, U. Jagadish, S. Yim, and I. Alahakoon,
"Analysing stillbirth data using dynamic self organizing maps,” in
DEXA Workshops, F. Morvan, A. M. Tjoa, and R. Wagner, Eds. IEEE
Computer Society, 2011, pp. 86–90.
[12] D. Alahakoon, "Controlling the spread of dynamic self-organising
maps,” Neural Computing and Applications, vol. 13, no. 2, pp.
168–174, 2004.
[13] R. Amarasiri, L. Wickramasinghe, and D. Alahakoon, "Enhanced cluster
visualization using the data skeleton model,” Proceedings of Soft
computing and the Web (ISCW), vol. 3, pp. 239–548, 2003.
[14] D. Davies and D. Bouldin, "A cluster separation measure,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, no. 2, pp.
224–227, 1979.
[15] D. Alahakoon, S. Halgamuge, and B. Srinivasan, "Mining a growing
feature map by data skeleton modelling,” Studies in fuzziness and soft
computing, vol. 68, pp. 217–250, 2001.
[16] N. Ahmad, D. Alahakoon, and R. Chau, "Cluster identification and
separation in the growing self-organizing map: application in protein
sequence classification,” Neural Computing and Applications, vol. 19,
no. 4, pp. 531–542, 2010.
[17] M. Schkolnick, "Clustering algorithm for hierarchical structures,” ACM
Trans. on Database Sys., vol. 2, no. 1, p. 27, Mar. 1977.
[18] D. Merkl, "Text classification with self-organizing maps: Some lessons
learned,” Neurocomputing, vol. 21, no. 1-3, pp. 61–77, 1998.
[19] D. D. Lewis, "Test Collections : Reuters-21578,” http://www.
daviddlewis.com/resources/testcollections/reuters21578/, 2004, (Online;
accessed 01-August-2009).
[20] C. Manning, P. Raghavan, and H. Schutze, Introduction to information
retrieval. Cambridge University Press Cambridge, 2008, vol. 1.
[21] J. Makhoul, F. Kubala, R. Schwartz, and R. Weischedel, "Performance
measures for information extraction,” in Proceedings of DARPA
Broadcast News Workshop, 1999, pp. 249–252.