Ontology-based Concept Weighting for Text Documents

Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.




References:
[1] W3C Semantic Web Activity Statement: W3C's Technology and Society
domain(2001). www.w3.org/2001/sw/Activity
[2] Smith, B.: Ontology. In: Blackwell Guide to the Philosophy of
Computing and Information, pp. 155-166. Oxford Blackwell, Malden
(2003).
[3] Berners-Lee, T., Weaving the Web, Harper, San Francisco, 1999
[4] Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M.,
Broekstra, J., Erdmann, M. and Horrocks, I. (2000) ÔÇÿThe semantic web:
the roles of XML and RDF-, IEEE Internet Computing, Vol.4, No. 5,
pp.63-74.
[5] Ding, Y., and Foo, S., (2002). Ontology Research and Development:
Part 1 - A Review of Ontology Generation. Journal of Information
Science 28 (2).
[6] Prof.K.Raja(2010) Clustering Technique with Feature Selection for Text
Documents.
[7] A. Hotho and S. Staab "Ontology based Text clustering.
[8] Andreas Hotho,"Ontologies improve Text Document Clustering".
[9] Lei Zhang , Zhichao Wang "Ontology-based clustering algorithm with
feature weights",2010Journal of Computational Information Systems 6:9
(2010) 2959-2966.
[10] A. Maedche and V. Zacharias, "Clustering Ontology-based Metadata in
the Semantic Web." In Proceedings of the 6th European Conference on
Principles and Practice of Knowledge Discovery in Databases
(PKDD'02), Helsinki, Finland, pp. 342-360, 2002
[11] Travis D. Breaux "Using Ontology in Hierarchical Information
Clustering", Proceedings of the 38th Hawaii International Conference on
System Sciences - 2005
[12] L. Jing, M. K. Ng, J. Xu and Z. Huang, Subspace clustering of text
documents with feature weighting k- means algorithm, Proc. of PAKDD,
pp. 802-812, 2005.
[13] W. Fan, L. Wallace, S. Rich, and Z. Zhang, "Tapping into the power of
text mining," the Communications of ACM, 2005.
[14] Jain, A.K, Murty, M.N., and Flynn P.J. 1999. Data clustering: a review.
ACM Computing Surveys, pp. 31, 3, 264-323.
[15] M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of
document clustering techniques. KDD Workshop on Text Mining-00
[16] P. Berkhin. 2004. Survey of clustering data mining techniques
[Online]. Available:
http://www.accrue.com/products/rp_cluster_review.pdf.