The emergence of the Internet has brewed the
revolution of information storage and retrieval. As most of the
data in the web is unstructured, and contains a mix of text,
video, audio etc, there is a need to mine information to cater to
the specific needs of the users without loss of important
hidden information. Thus developing user friendly and
automated tools for providing relevant information quickly
becomes a major challenge in web mining research. Most of
the existing web mining algorithms have concentrated on
finding frequent patterns while neglecting the less frequent
ones that are likely to contain outlying data such as noise,
irrelevant and redundant data. This paper mainly focuses on
Signed approach and full word matching on the organized
domain dictionary for mining web content outliers. This
Signed approach gives the relevant web documents as well as
outlying web documents. As the dictionary is organized based
on the number of characters in a word, searching and retrieval
of documents takes less time and less space.
[1] Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web
Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
[2] Changjun Wu, Guosun Zeng, Guorong Xu , A Web Page
Segmentation Algorithm for Extracting Product Information ,
Information Acquisition, 2006 IEEE International Conference on
Publication Date: Aug. 2006.
[3] Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang, A Utility based Web
Content Sensitivity Mining Approach, International Conference on Web
Intelligent and Intelligent Agent Technology (WIIAT), IEEE/WIC/ACM
2008.
[4] Hongqi li, Zhuang Wu, Xiaogang Ji, Research on the techniques for
Effectively Searching and Retrieving Information from Internet,
International Symposium on Electronic Commerce and Security, IEEE
2008
[5] Jaroslav Pokorny, Jozef Smizansky, Page Content Rank: An approach to
the Web Content Mining
[6] Jiang Yiyong, Zhang Jifu,Cai Jainghui, Zhang Sulan, Hu Lihua , The
Outliers Mining Algorithm Based On Constrained Concept Lattice,
Internal Symposium on Data Privacy and E.commerce , IEEE 2007.
[7] kshitija Pol, Nita Patil, Shreya Patankar, Chhaya Das, A Survey on Web
Content Mining and Extraction of Structured and Semistructured
data,First International Conference on Emerging trends in Engineering
and Technology, 2008
[8] Malik Agyemang, Ken Barker, Rada S. Alhajj, Framework for Mining
Web Content Outliers , 2004 ACM Symposiumon Applied Computing.
[9] Malik Agyemang Ken Barker Rada S. Alhajj , Mining Web Content
Outliers using Structure Oriented Weighting Techniques and N-Grams ,
2005 ACM Symposium on Applied Computing
[10] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi,Set theoretical Approach for
mining web content through outliers detection, International journal on
research and industrial applications, Volume 2, Jan 2009.
[11] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi, Elimination of redundant
Links in web pages- Mathematical Approach, Proc. Of World Academy
of Science, Engineering and Technology, Volume 40, April 2009, pp
555-562
[12] Peng Yang, Biao Huang, A modified Density Based Outliers Mining
Algorithm for large Dataset, 2008 IEEE, International Seminar on Future
Information technology and Management Engineering.
[13] Peng Yang, Biao Huang, Density Based Outliers Mining Algorithm with
Application to Intrusion Detection, 2008 IEEE, Pacific asia workshop on
computational Intelligence and Industrial Application.
[14] Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining
outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 -
138.
[15] Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey,
ACM SIGKDD, July 2000
[16] Ricardo Campos , Gael Dias, Celia Nunes, WISE : Hierarchical Soft
Clustering of Web Page Search Results based on Web Content Mining
Techniques, International conference on Web Intelligence,
IEEE/WIC/ACM 2006.
[17] R.P. Grimaldi, "Discrete and Combinatorial Mathematics", Pearson
Edition, New Delhi 2002.
[18] Kenneth H. Rosen, "Discrete Mathematics and its Applications", Fifth
Edition, TMH, 2003.
[19] J.P. Tremblay and R. Manohar, "Discrete Mathematical Structures with
Applications to Computer Science", TMH, 1997.
[20] M.K. Venkataraman, N. Sridharan and N.Chandrasekaran, "Discrete
Mathematics", The National Publishing Company, 2003.
[21] J.W.Han, M.Kamber, Data Mining: Concepts and Techniques Newyork
kaufmann publishers 2001.
[1] Bing Liu, Kevin Chen- Chuan Chang , Editorial: Special issue on Web
Content Mining , SIGKDD Explorations, Volume 6, Issue 2.
[2] Changjun Wu, Guosun Zeng, Guorong Xu , A Web Page
Segmentation Algorithm for Extracting Product Information ,
Information Acquisition, 2006 IEEE International Conference on
Publication Date: Aug. 2006.
[3] Cheng Wang, Ying Liu, Liheng Jian, Peng Zhang, A Utility based Web
Content Sensitivity Mining Approach, International Conference on Web
Intelligent and Intelligent Agent Technology (WIIAT), IEEE/WIC/ACM
2008.
[4] Hongqi li, Zhuang Wu, Xiaogang Ji, Research on the techniques for
Effectively Searching and Retrieving Information from Internet,
International Symposium on Electronic Commerce and Security, IEEE
2008
[5] Jaroslav Pokorny, Jozef Smizansky, Page Content Rank: An approach to
the Web Content Mining
[6] Jiang Yiyong, Zhang Jifu,Cai Jainghui, Zhang Sulan, Hu Lihua , The
Outliers Mining Algorithm Based On Constrained Concept Lattice,
Internal Symposium on Data Privacy and E.commerce , IEEE 2007.
[7] kshitija Pol, Nita Patil, Shreya Patankar, Chhaya Das, A Survey on Web
Content Mining and Extraction of Structured and Semistructured
data,First International Conference on Emerging trends in Engineering
and Technology, 2008
[8] Malik Agyemang, Ken Barker, Rada S. Alhajj, Framework for Mining
Web Content Outliers , 2004 ACM Symposiumon Applied Computing.
[9] Malik Agyemang Ken Barker Rada S. Alhajj , Mining Web Content
Outliers using Structure Oriented Weighting Techniques and N-Grams ,
2005 ACM Symposium on Applied Computing
[10] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi,Set theoretical Approach for
mining web content through outliers detection, International journal on
research and industrial applications, Volume 2, Jan 2009.
[11] G.Poonkuzhali, K.Thiagarajan, K.Sarukesi, Elimination of redundant
Links in web pages- Mathematical Approach, Proc. Of World Academy
of Science, Engineering and Technology, Volume 40, April 2009, pp
555-562
[12] Peng Yang, Biao Huang, A modified Density Based Outliers Mining
Algorithm for large Dataset, 2008 IEEE, International Seminar on Future
Information technology and Management Engineering.
[13] Peng Yang, Biao Huang, Density Based Outliers Mining Algorithm with
Application to Intrusion Detection, 2008 IEEE, Pacific asia workshop on
computational Intelligence and Industrial Application.
[14] Ramaswamy S, Rastogi R, Shim k, Efficient Algorithm for mining
outliers from large data sets, proc. Of ACM SIGMOD 2000, pp 127 -
138.
[15] Raymond Kosala, Hendrik Blockeel, Web Mining Research: A Survey,
ACM SIGKDD, July 2000
[16] Ricardo Campos , Gael Dias, Celia Nunes, WISE : Hierarchical Soft
Clustering of Web Page Search Results based on Web Content Mining
Techniques, International conference on Web Intelligence,
IEEE/WIC/ACM 2006.
[17] R.P. Grimaldi, "Discrete and Combinatorial Mathematics", Pearson
Edition, New Delhi 2002.
[18] Kenneth H. Rosen, "Discrete Mathematics and its Applications", Fifth
Edition, TMH, 2003.
[19] J.P. Tremblay and R. Manohar, "Discrete Mathematical Structures with
Applications to Computer Science", TMH, 1997.
[20] M.K. Venkataraman, N. Sridharan and N.Chandrasekaran, "Discrete
Mathematics", The National Publishing Company, 2003.
[21] J.W.Han, M.Kamber, Data Mining: Concepts and Techniques Newyork
kaufmann publishers 2001.
@article{"International Journal of Information, Control and Computer Sciences:62652", author = "G. Poonkuzhali and K.Thiagarajan and K.Sarukesi and G.V.Uma", title = "Signed Approach for Mining Web Content Outliers", abstract = "The emergence of the Internet has brewed the
revolution of information storage and retrieval. As most of the
data in the web is unstructured, and contains a mix of text,
video, audio etc, there is a need to mine information to cater to
the specific needs of the users without loss of important
hidden information. Thus developing user friendly and
automated tools for providing relevant information quickly
becomes a major challenge in web mining research. Most of
the existing web mining algorithms have concentrated on
finding frequent patterns while neglecting the less frequent
ones that are likely to contain outlying data such as noise,
irrelevant and redundant data. This paper mainly focuses on
Signed approach and full word matching on the organized
domain dictionary for mining web content outliers. This
Signed approach gives the relevant web documents as well as
outlying web documents. As the dictionary is organized based
on the number of characters in a word, searching and retrieval
of documents takes less time and less space.", keywords = "Outliers, Relevant document,, Signed Approach,
Web content mining, Web documents..", volume = "3", number = "8", pages = "2117-5", }