Deep Web Content Mining

The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting information with parsing approach, we use a new data mining algorithm to match a large number of schemas in databases at a time. Using this algorithm increases the speed of information matching. In addition, instead of simple 1:1 matching, they do complex (m:n) matching between query interfaces. In this paper we present a novel correlation mining algorithm that matches correlated attributes with smaller cost. This algorithm uses Jaccard measure to distinguish positive and negative correlated attributes. After that, system matches the user query with different query interfaces in special domain and finally chooses the nearest query interface with user query to answer to it.




References:
[1] Bin He, Kevin chen-chuan chang; "Automatic complex schema
matching across web query interfaces: A correlation mining
approach"; ACM Transactions on Databases Systems; Vol. 31; No.1;
Pages 1-45; March 2006.
[2] Michael K. Bergman; "The Deep Web: Surfacing Hidden Value";
www.BrightPlanet.com; Pages 1-5; 2001.
[3] Kevin chen-chuan chang; "Toward Large Scale Integration: Building a
Metaquerier over databases on the web"; VLDB Journal; 2005.
[4] Zhen Zhang; "Light-weight Domain-based Form Assistant: Querying
web databases on the fly "; 31st VLDB Conference; Trondheim
Norway; 2005.
[5] M. A. Hearst and J. O. Pederson; "Reexamining the cluster hypothesis:
Scatter/gather on retrieval results"; In Proceedings of SIGIR; Pages 76-
84; 1996.
[6] O. Zamir and O. Etzioni; "Web document clustering: a feasibility
demonstration"; In Proceedings of SIGIR; 1998.
[7] Sh. Ajoudanian, M. Davarpanah Jazi, and M. Saraee; "Discovering
Knowledge from Deep Web Databases using Correlation Mining
Approach"; IDMC Conference; Iran; 2007.
[8] Bin He, Kevin chen-chuan chang; "Statistical schema matching across
web query interfaces"; In SIGMOD Conferences; 2003.
[9] E. Rahm, P. A. Bernstein;"A survey of approaches to automatic schema
matching"; VLDB Journal; no 10; Pages 334-350; 2001.
[10] Agrawal R., Imielinski T., Swami A. N.; "Mining association rules
between sets of items in large databases"; In SIGMOD Conference;
1993.
[11] Y-K Lee, W-Y Kim, Y. D. Cai; "Efficient mining of correlated
patterns"; In SIGMOD Conference; 2003.
[12] S. Brin, R. Motwani, C. Silverstein; "Beyond market baskets:
generalizing association rules to correlations"; In SIGMOD
Conference; 1997.