Signed Approach for Mining Web Content Outliers

The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.




References:
[1] Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web
Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
[2] Changjun Wu, Guosun Zeng, Guorong Xu , A Web Page
Segmentation Algorithm for Extracting Product Information ,
Information Acquisition, 2006 IEEE International Conference on
Publication Date: Aug. 2006.
[3] Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang, A Utility based Web
Content Sensitivity Mining Approach, International Conference on Web
Intelligent and Intelligent Agent Technology (WIIAT), IEEE/WIC/ACM
2008.
[4] Hongqi li, Zhuang Wu, Xiaogang Ji, Research on the techniques for
Effectively Searching and Retrieving Information from Internet,
International Symposium on Electronic Commerce and Security, IEEE
2008
[5] Jaroslav Pokorny, Jozef Smizansky, Page Content Rank: An approach to
the Web Content Mining
[6] Jiang Yiyong, Zhang Jifu,Cai Jainghui, Zhang Sulan, Hu Lihua , The
Outliers Mining Algorithm Based On Constrained Concept Lattice,
Internal Symposium on Data Privacy and E.commerce , IEEE 2007.
[7] kshitija Pol, Nita Patil, Shreya Patankar, Chhaya Das, A Survey on Web
Content Mining and Extraction of Structured and Semistructured
data,First International Conference on Emerging trends in Engineering
and Technology, 2008
[8] Malik Agyemang, Ken Barker, Rada S. Alhajj, Framework for Mining
Web Content Outliers , 2004 ACM Symposiumon Applied Computing.
[9] Malik Agyemang Ken Barker Rada S. Alhajj , Mining Web Content
Outliers using Structure Oriented Weighting Techniques and N-Grams ,
2005 ACM Symposium on Applied Computing
[10] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi,Set theoretical Approach for
mining web content through outliers detection, International journal on
research and industrial applications, Volume 2, Jan 2009.
[11] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi, Elimination of redundant
Links in web pages- Mathematical Approach, Proc. Of World Academy
of Science, Engineering and Technology, Volume 40, April 2009, pp
555-562
[12] Peng Yang, Biao Huang, A modified Density Based Outliers Mining
Algorithm for large Dataset, 2008 IEEE, International Seminar on Future
Information technology and Management Engineering.
[13] Peng Yang, Biao Huang, Density Based Outliers Mining Algorithm with
Application to Intrusion Detection, 2008 IEEE, Pacific asia workshop on
computational Intelligence and Industrial Application.
[14] Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining
outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 -
138.
[15] Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey,
ACM SIGKDD, July 2000
[16] Ricardo Campos , Gael Dias, Celia Nunes, WISE : Hierarchical Soft
Clustering of Web Page Search Results based on Web Content Mining
Techniques, International conference on Web Intelligence,
IEEE/WIC/ACM 2006.
[17] R.P. Grimaldi, "Discrete and Combinatorial Mathematics", Pearson
Edition, New Delhi 2002.
[18] Kenneth H. Rosen, "Discrete Mathematics and its Applications", Fifth
Edition, TMH, 2003.
[19] J.P. Tremblay and R. Manohar, "Discrete Mathematical Structures with
Applications to Computer Science", TMH, 1997.
[20] M.K. Venkataraman, N. Sridharan and N.Chandrasekaran, "Discrete
Mathematics", The National Publishing Company, 2003.
[21] J.W.Han, M.Kamber, Data Mining: Concepts and Techniques Newyork
kaufmann publishers 2001.