Analyzing Methods of the Relation between Concepts based on a Concept Hierarchy

Data objects are usually organized hierarchically, and the relations between them are analyzed based on a corresponding concept hierarchy. The relation between data objects, for example how similar they are, are usually analyzed based on the conceptual distance in the hierarchy. If a node is an ancestor of another node, it is enough to analyze how close they are by calculating the distance vertically. However, if there is not such relation between two nodes, the vertical distance cannot express their relation explicitly. This paper tries to fill this gap by improving the analysis method for data objects based on hierarchy. The contributions of this paper include: (1) proposing an improved method to evaluate the vertical distance between concepts; (2) defining the concept horizontal distance and a method to calculate the horizontal distance; and (3) discussing the methods to confine a range by the horizontal distance and the vertical distance, and evaluating the relation between concepts.




References:
[1] M. Kuzunishi, T. Furukawa, and K. Lu, "Analyzing Multi-Labeled Data
Based on the Roll of a Concept against a Semantic Range," in Proc. of the
Int'l Conf. on World Academy of Sciences, Engineering and Technology,
Singapore, 2010, pp. 498-504.
[2] D. Koller, and M. Sahami, "Hierarchically Classifying Documents Using
Very Few Words," in Proc. of the Fourteenth Int'l Conf. on Machine
Learning, 1997, pp.170-178.
[3] S. Amit, "Modern information retrieval: a brief overview," IEEE Data
Eng. Bull., vol. 24, Dec. 2001, pp. 35-43.
[4] T. Li, S. Zhu, and M. Ogihara, "Topic hierarchy generation via linear
discriminant projection," in Proc. of the 26th annual international ACM
SIGIR Conf. on Research and Development in Information Retrieval,
2003, pp. 421-422.
[5] Y. Wang, and Z. Gong, "Hierarchical Classification of Web Pages Using
Support Vector Machine," in Proc. of the 11th Int'l Conf. on Asian Digital
Libraries, 2008, pp. 12-32.
[6] J. R. Rose, and J. Gasteiger, "Hierarchical classification as an aid to
database and hit-list browsing," in Proc. of the third Int'l Conf. on
Information and Knowledge Management, 1994, pp. 408-414.
[7] S. Dumais, and H. Chen, "Hierarchical classification of Web content," in
Proc. of the 23rd annual Int'l ACM SIGIR Conf. on Research and
Development in Information Retrieval, 2000, pp. 256-263.
[8] C. C. Hsu, and Y. P. Huang, "Incremental clustering of mixed data based
on distance hierarchy," Expert Syst. Appl. Vol.35, 2008, pp. 1177-1185.
[9] A. X. Sun, and E. P. Lim, "Hierarchical Text Classification and
Evaluation," in Proc. of the 2001 IEEE Int'l Conf. on Data Mining, 2001,
pp. 521-528.
[10] A. El Sayed, H. Hacid, and D. Zighed, "Using semantic distance in a
content-based heterogeneous information retrieval system," in Proc. of
the 3rd ECML/PKDD Int'l Conf. on Mining Complex Data, 2008, pp.
224-237.
[11] M. Kuzunishi, and T. Furukawa, "Representation for multiple classified
data," in Proc. of the 24th IASTED Int'l Conf. on Database and
Applications, 2006, pp.135-142.
[12] B. Catherine, and P. Wanda, "Better Rules, Few Features: A Semantic
Approach to Selecting Features from Text," in Proc. of the 2001 IEEE
Int'l Conf. on Data Mining, 2001, pp. 59-66.
[13] K. Bade, and A. N├╝rnberger, "Creating a Cluster Hierarchy under
Constraints of a Partially Known Hierarchy," in Proc. of the 2008 SIAM
Int'l Conf. on Data Mining, 2008, pp. 13-24.
[14] A. M. Funes, C. Ferri, J. Hernández-Orallo, and M. J. Ramírez-Quintana,
"Hierarchical Distance-Based Conceptual Clustering," in Proc. of the
Conf. on Machine Learning and Principles and Practice of Knowledge
Discovery in Databases, 2008, pp. 349-364.
[15] K. Toutanova, F. Chen, K. Popat, and T. Hofmann, "Text Classification in
a Hierarchical Mixture Model for Small Training Sets," in Proc. of Int'l
Conf. on Information and Knowledge Management, 2001, pp. 105-112.