Ontology-based Concept Weighting for Text Documents
Documents clustering become an essential technology
with the popularity of the Internet. That also means that fast and
high-quality document clustering technique play core topics. Text
clustering or shortly clustering is about discovering semantically
related groups in an unstructured collection of documents. Clustering
has been very popular for a long time because it provides unique
ways of digesting and generalizing large amounts of information.
One of the issues of clustering is to extract proper feature (concept)
of a problem domain. The existing clustering technology mainly
focuses on term weight calculation. To achieve more accurate
document clustering, more informative features including concept
weight are important. Feature Selection is important for clustering
process because some of the irrelevant or redundant feature may
misguide the clustering results. To counteract this issue, the proposed
system presents the concept weight for text clustering system
developed based on a k-means algorithm in accordance with the
principles of ontology so that the important of words of a cluster can
be identified by the weight values. To a certain extent, it has resolved
the semantic problem in specific areas.
[1] W3C Semantic Web Activity Statement: W3C's Technology and Society
domain(2001). www.w3.org/2001/sw/Activity
[2] Smith, B.: Ontology. In: Blackwell Guide to the Philosophy of
Computing and Information, pp. 155-166. Oxford Blackwell, Malden
(2003).
[3] Berners-Lee, T., Weaving the Web, Harper, San Francisco, 1999
[4] Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M.,
Broekstra, J., Erdmann, M. and Horrocks, I. (2000) ÔÇÿThe semantic web:
the roles of XML and RDF-, IEEE Internet Computing, Vol.4, No. 5,
pp.63-74.
[5] Ding, Y., and Foo, S., (2002). Ontology Research and Development:
Part 1 - A Review of Ontology Generation. Journal of Information
Science 28 (2).
[6] Prof.K.Raja(2010) Clustering Technique with Feature Selection for Text
Documents.
[7] A. Hotho and S. Staab "Ontology based Text clustering.
[8] Andreas Hotho,"Ontologies improve Text Document Clustering".
[9] Lei Zhang , Zhichao Wang "Ontology-based clustering algorithm with
feature weights",2010Journal of Computational Information Systems 6:9
(2010) 2959-2966.
[10] A. Maedche and V. Zacharias, "Clustering Ontology-based Metadata in
the Semantic Web." In Proceedings of the 6th European Conference on
Principles and Practice of Knowledge Discovery in Databases
(PKDD'02), Helsinki, Finland, pp. 342-360, 2002
[11] Travis D. Breaux "Using Ontology in Hierarchical Information
Clustering", Proceedings of the 38th Hawaii International Conference on
System Sciences - 2005
[12] L. Jing, M. K. Ng, J. Xu and Z. Huang, Subspace clustering of text
documents with feature weighting k- means algorithm, Proc. of PAKDD,
pp. 802-812, 2005.
[13] W. Fan, L. Wallace, S. Rich, and Z. Zhang, "Tapping into the power of
text mining," the Communications of ACM, 2005.
[14] Jain, A.K, Murty, M.N., and Flynn P.J. 1999. Data clustering: a review.
ACM Computing Surveys, pp. 31, 3, 264-323.
[15] M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of
document clustering techniques. KDD Workshop on Text Mining-00
[16] P. Berkhin. 2004. Survey of clustering data mining techniques
[Online]. Available:
http://www.accrue.com/products/rp_cluster_review.pdf.
[1] W3C Semantic Web Activity Statement: W3C's Technology and Society
domain(2001). www.w3.org/2001/sw/Activity
[2] Smith, B.: Ontology. In: Blackwell Guide to the Philosophy of
Computing and Information, pp. 155-166. Oxford Blackwell, Malden
(2003).
[3] Berners-Lee, T., Weaving the Web, Harper, San Francisco, 1999
[4] Decker, S., Melnik, S., Van Harmelen, F., Fensel, D., Klein, M.,
Broekstra, J., Erdmann, M. and Horrocks, I. (2000) ÔÇÿThe semantic web:
the roles of XML and RDF-, IEEE Internet Computing, Vol.4, No. 5,
pp.63-74.
[5] Ding, Y., and Foo, S., (2002). Ontology Research and Development:
Part 1 - A Review of Ontology Generation. Journal of Information
Science 28 (2).
[6] Prof.K.Raja(2010) Clustering Technique with Feature Selection for Text
Documents.
[7] A. Hotho and S. Staab "Ontology based Text clustering.
[8] Andreas Hotho,"Ontologies improve Text Document Clustering".
[9] Lei Zhang , Zhichao Wang "Ontology-based clustering algorithm with
feature weights",2010Journal of Computational Information Systems 6:9
(2010) 2959-2966.
[10] A. Maedche and V. Zacharias, "Clustering Ontology-based Metadata in
the Semantic Web." In Proceedings of the 6th European Conference on
Principles and Practice of Knowledge Discovery in Databases
(PKDD'02), Helsinki, Finland, pp. 342-360, 2002
[11] Travis D. Breaux "Using Ontology in Hierarchical Information
Clustering", Proceedings of the 38th Hawaii International Conference on
System Sciences - 2005
[12] L. Jing, M. K. Ng, J. Xu and Z. Huang, Subspace clustering of text
documents with feature weighting k- means algorithm, Proc. of PAKDD,
pp. 802-812, 2005.
[13] W. Fan, L. Wallace, S. Rich, and Z. Zhang, "Tapping into the power of
text mining," the Communications of ACM, 2005.
[14] Jain, A.K, Murty, M.N., and Flynn P.J. 1999. Data clustering: a review.
ACM Computing Surveys, pp. 31, 3, 264-323.
[15] M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of
document clustering techniques. KDD Workshop on Text Mining-00
[16] P. Berkhin. 2004. Survey of clustering data mining techniques
[Online]. Available:
http://www.accrue.com/products/rp_cluster_review.pdf.
@article{"International Journal of Information, Control and Computer Sciences:49768", author = "Hmway Hmway Tar and Thi Thi Soe Nyaunt", title = "Ontology-based Concept Weighting for Text Documents", abstract = "Documents clustering become an essential technology
with the popularity of the Internet. That also means that fast and
high-quality document clustering technique play core topics. Text
clustering or shortly clustering is about discovering semantically
related groups in an unstructured collection of documents. Clustering
has been very popular for a long time because it provides unique
ways of digesting and generalizing large amounts of information.
One of the issues of clustering is to extract proper feature (concept)
of a problem domain. The existing clustering technology mainly
focuses on term weight calculation. To achieve more accurate
document clustering, more informative features including concept
weight are important. Feature Selection is important for clustering
process because some of the irrelevant or redundant feature may
misguide the clustering results. To counteract this issue, the proposed
system presents the concept weight for text clustering system
developed based on a k-means algorithm in accordance with the
principles of ontology so that the important of words of a cluster can
be identified by the weight values. To a certain extent, it has resolved
the semantic problem in specific areas.", keywords = "Clustering, Concept Weight, Document clustering,Feature Selection, Ontology", volume = "5", number = "9", pages = "980-5", }