Data Extraction of XML Files using Searching and Indexing Techniques

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.





References:
[1] M. F. Porter, "An algorithm for suffix stripping," Program, vol. 14, no.
3, pp. 130-137, 1980.
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, "The lorel
query language for semistructured data," JODL, vol. 1, no. 1, pp. 68-88,
April 1997.
[3] S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web
search engine," Computer Networks and ISDN Systems, vol. 30, no. 1-7,
pp. 107-117, 1998. [Online]. Available: citeseer.ist.psu.edu/
brin98anatomy.html
[4] D. S. J. Robie, J. Lapp, "Xml query language (xql)," QL-98 The Query
Languages Workshop, 1998, www.w3.org/TandS/QL/QL98/pp/xql.html.
[5] Xml path language (xpath) version 1.0," Tech. Rep., November 1999,
http://www.w3.org/TR/xpath.
[6] A. Bonifati and S. Ceri, "Comparative analysis of five XML query
languages," SIGMOD Record, vol. 29, no. 1, pp. 68-79, 2000. [Online].
Available: citeseer.ist.psu.edu/article/bonifati00comparative.html
[7] N. Fuhr and K. Grosjohann, "XIRQL: A query language for information
retrieval in XML documents," in Research and Development in
Information Retrieval, 2001, pp. 172-180. (Online). Available:
citeseer.ist.psu.edu/fuhr01xirql.html
[8] A. Theobald and G. Weikum, "The index-based xxl search engine for
querying xml data with relevance ranking," in EDBT -02: Proceedings
of the 8th International Conference on Extending Database Technology.
London, UK: Springer-Verlag, 2002, pp. 477-495.
[9] D. Carmel, Y. Maarek, Y. Mass, N. Efraty, and G. Landau, "An
Extension of the Vector Space Model for Querying XML documents via
XML fragments," in ACM SIGIR 2002 Workshop on XML and
Information Retrieval, Tampere, Finland, august 2002.
[10] S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv, "XSearch : A Semantic
Search Engine for XML ," in 29th VLDB Conference, berlin, Germany,
2003, http://www.vldb.org/conf/2003/papers/S03P02.pdf.
[11] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, "Xrank: Ranked
keyword search over xml documents," 2003. (Online). Available:
citeseer.ist.psu.edu/guo03xrank.html
[12] H. Meyer, I. Bruder, G. Weber, and A. Heuer, "The xircus search
engine," 2003. (Online). Available: citeseer.ist.psu.edu/meyer03xircus.
Html
[13] P. Francq, "Collaborative and structured search: an integrated approach
for sharing documents among users," Ph.D. dissertation, Universit'e
libre de Bruxelles, June 2003.
[14] W. W. W. Consortium, "Xquery 1.0: an xml query language," Tech.
Rep., November 2003, http://www.w3.org/TR/xquery.
[15] K. Sauvagnat and M. Boughanem, "XFIRM: A Flexible Information
Retrieval Model for Indexing and Searching XML documents," in ECIR
(European Conference on Information Retrieval)- Proceedings volume 2
(Poster Abstracts) , Sunderland, UK. - Edited by Michael P. Oakes,5-7
avril 2004, pp. 17-18.