The rapid expansion of the web is causing the
constant growth of information, leading to several problems such as
increased difficulty of extracting potentially useful knowledge. Web
content mining confronts this problem gathering explicit information
from different web sites for its access and knowledge discovery.
Query interfaces of web databases share common building blocks.
After extracting information with parsing approach, we use a new
data mining algorithm to match a large number of schemas in
databases at a time. Using this algorithm increases the speed of
information matching. In addition, instead of simple 1:1 matching,
they do complex (m:n) matching between query interfaces. In this
paper we present a novel correlation mining algorithm that matches
correlated attributes with smaller cost. This algorithm uses Jaccard
measure to distinguish positive and negative correlated attributes.
After that, system matches the user query with different query
interfaces in special domain and finally chooses the nearest query
interface with user query to answer to it.
[1] Bin He, Kevin chen-chuan chang; "Automatic complex schema
matching across web query interfaces: A correlation mining
approach"; ACM Transactions on Databases Systems; Vol. 31; No.1;
Pages 1-45; March 2006.
[2] Michael K. Bergman; "The Deep Web: Surfacing Hidden Value";
www.BrightPlanet.com; Pages 1-5; 2001.
[3] Kevin chen-chuan chang; "Toward Large Scale Integration: Building a
Metaquerier over databases on the web"; VLDB Journal; 2005.
[4] Zhen Zhang; "Light-weight Domain-based Form Assistant: Querying
web databases on the fly "; 31st VLDB Conference; Trondheim
Norway; 2005.
[5] M. A. Hearst and J. O. Pederson; "Reexamining the cluster hypothesis:
Scatter/gather on retrieval results"; In Proceedings of SIGIR; Pages 76-
84; 1996.
[6] O. Zamir and O. Etzioni; "Web document clustering: a feasibility
demonstration"; In Proceedings of SIGIR; 1998.
[7] Sh. Ajoudanian, M. Davarpanah Jazi, and M. Saraee; "Discovering
Knowledge from Deep Web Databases using Correlation Mining
Approach"; IDMC Conference; Iran; 2007.
[8] Bin He, Kevin chen-chuan chang; "Statistical schema matching across
web query interfaces"; In SIGMOD Conferences; 2003.
[9] E. Rahm, P. A. Bernstein;"A survey of approaches to automatic schema
matching"; VLDB Journal; no 10; Pages 334-350; 2001.
[10] Agrawal R., Imielinski T., Swami A. N.; "Mining association rules
between sets of items in large databases"; In SIGMOD Conference;
1993.
[11] Y-K Lee, W-Y Kim, Y. D. Cai; "Efficient mining of correlated
patterns"; In SIGMOD Conference; 2003.
[12] S. Brin, R. Motwani, C. Silverstein; "Beyond market baskets:
generalizing association rules to correlations"; In SIGMOD
Conference; 1997.
[1] Bin He, Kevin chen-chuan chang; "Automatic complex schema
matching across web query interfaces: A correlation mining
approach"; ACM Transactions on Databases Systems; Vol. 31; No.1;
Pages 1-45; March 2006.
[2] Michael K. Bergman; "The Deep Web: Surfacing Hidden Value";
www.BrightPlanet.com; Pages 1-5; 2001.
[3] Kevin chen-chuan chang; "Toward Large Scale Integration: Building a
Metaquerier over databases on the web"; VLDB Journal; 2005.
[4] Zhen Zhang; "Light-weight Domain-based Form Assistant: Querying
web databases on the fly "; 31st VLDB Conference; Trondheim
Norway; 2005.
[5] M. A. Hearst and J. O. Pederson; "Reexamining the cluster hypothesis:
Scatter/gather on retrieval results"; In Proceedings of SIGIR; Pages 76-
84; 1996.
[6] O. Zamir and O. Etzioni; "Web document clustering: a feasibility
demonstration"; In Proceedings of SIGIR; 1998.
[7] Sh. Ajoudanian, M. Davarpanah Jazi, and M. Saraee; "Discovering
Knowledge from Deep Web Databases using Correlation Mining
Approach"; IDMC Conference; Iran; 2007.
[8] Bin He, Kevin chen-chuan chang; "Statistical schema matching across
web query interfaces"; In SIGMOD Conferences; 2003.
[9] E. Rahm, P. A. Bernstein;"A survey of approaches to automatic schema
matching"; VLDB Journal; no 10; Pages 334-350; 2001.
[10] Agrawal R., Imielinski T., Swami A. N.; "Mining association rules
between sets of items in large databases"; In SIGMOD Conference;
1993.
[11] Y-K Lee, W-Y Kim, Y. D. Cai; "Efficient mining of correlated
patterns"; In SIGMOD Conference; 2003.
[12] S. Brin, R. Motwani, C. Silverstein; "Beyond market baskets:
generalizing association rules to correlations"; In SIGMOD
Conference; 1997.
@article{"International Journal of Information, Control and Computer Sciences:59750", author = "Shohreh Ajoudanian and Mohammad Davarpanah Jazi", title = "Deep Web Content Mining", abstract = "The rapid expansion of the web is causing the
constant growth of information, leading to several problems such as
increased difficulty of extracting potentially useful knowledge. Web
content mining confronts this problem gathering explicit information
from different web sites for its access and knowledge discovery.
Query interfaces of web databases share common building blocks.
After extracting information with parsing approach, we use a new
data mining algorithm to match a large number of schemas in
databases at a time. Using this algorithm increases the speed of
information matching. In addition, instead of simple 1:1 matching,
they do complex (m:n) matching between query interfaces. In this
paper we present a novel correlation mining algorithm that matches
correlated attributes with smaller cost. This algorithm uses Jaccard
measure to distinguish positive and negative correlated attributes.
After that, system matches the user query with different query
interfaces in special domain and finally chooses the nearest query
interface with user query to answer to it.", keywords = "Content mining, complex matching, correlation
mining, information extraction.", volume = "3", number = "1", pages = "124-5", }