TOSOM: A Topic-Oriented Self-Organizing Map for Text Organization

The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.





References:
[1] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 1997.
[2] M. P¨oll¨a, T. Honkela, and T. Kohonen, "Bibliography of self-organizing
map (SOM) papers: 2002-2005 addendum," Information and Computer
Science, Helsinki University of Technology, Espoo, Finland, Tech. Rep.
TKK-ICS-R24, 2009.
[3] T. Kohonen, "Self-organizing formation of topologically correct feature
maps," Biological Cybernetics, vol. 43, no. 1, pp. 59-69, 1982.
[4] B. Fritzke, "Growing grid - a self-organizing network with constant
neighborhood range and adaption strength," Neural Processing Letter,
vol. 2, no. 5, pp. 9-13, 1995.
[5] R. Miikkulainen, "Script recognition with hierarchical feature maps,"
Connection Science, vol. 2, pp. 83-101, 1990.
[6] P. Koikkalainen, "Tree structured self-organizing maps," in Kohonen
Maps, E. Oja and S. Kaski, Eds. Amsterdam, Netherlands: Elsevier,
1999, pp. 121-130.
[7] A. Rauber, M. Dittenbach, and D. Merkl, "Towards automatic contentbased
organization of multilingual digital libraries: An English, French
and German view of the Russian information agency Nowosti news," in
Proceedings of the Third All-Russian Scientific Conference on Digital
Libraries: Advanced Methods And Technologies, Digital Collections,
September 11-13 2001, pp. 11-13.
[8] A. Rauber, D. Merkl, and M. Dittenbach, "The growing hierarchical selforganizing
map: exploratory analysis of high-dimensional data," IEEE
Transactions on Neural Networks, vol. 13, no. 6, pp. 1331-1341, 2002.
[9] M. Dittenbach, A. Rauber, and D. Merkl, "Recent advances with
the growing hierarchical self-organizing map," in Advances in Self-
Organizing Maps, N. Allinson, Y. Ahujun, L. Allinson, and J. Slack,
Eds. Lincoln, England: Springer, 2001, pp. 140-145.
[10] S. Kaski, T. Honkela, K. Lagus, and T. Kohonen, "WEBSOM-Selforganizing
maps of document collections," Neurocomputing, vol. 21,
pp. 101-117, 1998.
[11] Y. Liu, X. Wang, and C. Wu, "ConSOM: A conceptional self-organizing
map model for text clustering," Neurocomputing, vol. 71, no. 4-6, pp.
857-862, 2008.
[12] G. A. Miller, "WordNet: A lexical database for English," Communications
of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
[13] T. Pedersen, S. Patwardhan, and J. Michelizzi, "WordNet::Similarity -
measuring the relatedness of concepts," in HLT-NAACL 2004: Demonstration
Papers, D. M. Susan Dumais and S. Roukos, Eds. Boston,
Massachusetts, USA: Association for Computational Linguistics, May 2
- May 7 2004, pp. 38-41.
[14] C. H. Lee and H. C. Yang, "A Web text mining approach based on selforganizing
map," in Proceedings of the ACM CIKM-99 2nd Workshop on
Web Information and Data Management, Kansas City, Missouri, 1999,
pp. 59-62.
[15] H. C. Yang and C. H. Lee, "A text mining approach on automatic
generation of Web directories and hierarchies," Expert Systems with
Applications, vol. 27, no. 4, pp. 645-663, 2004.