Abstract: XML is a markup language which is becoming the
standard format for information representation and data exchange. A
major purpose of XML is the explicit representation of the logical
structure of a document. Much research has been performed to
exploit logical structure of documents in information retrieval in
order to precisely extract user information need from large
collections of XML documents. In this paper, we describe an XML
information retrieval weighting scheme that tries to find the most
relevant elements in XML documents in response to a user query.
We present this weighting model for information retrieval systems
that utilize plausible inferences to infer the relevance of elements in
XML documents. We also add to this model the Dempster-Shafer
theory of evidence to express the uncertainty in plausible inferences
and Dempster-Shafer rule of combination to combine evidences
derived from different inferences.
Abstract: Mining frequent tree patterns have many useful
applications in XML mining, bioinformatics, network routing, etc.
Most of the frequent subtree mining algorithms (i.e. FREQT,
TreeMiner and CMTreeMiner) use anti-monotone property in the
phase of candidate subtree generation. However, none of these
algorithms have verified the correctness of this property in tree
structured data. In this research it is shown that anti-monotonicity
does not generally hold, when using weighed support in tree pattern
discovery. As a result, tree mining algorithms that are based on this
property would probably miss some of the valid frequent subtree
patterns in a collection of trees. In this paper, we investigate the
correctness of anti-monotone property for the problem of weighted
frequent subtree mining. In addition we propose W3-Miner, a new
algorithm for full extraction of frequent subtrees. The experimental
results confirm that W3-Miner finds some frequent subtrees that the
previously proposed algorithms are not able to discover.
Abstract: The inherent flexibilities of XML in both structure
and semantics makes mining from XML data a complex task with
more challenges compared to traditional association rule mining in
relational databases. In this paper, we propose a new model for the
effective extraction of generalized association rules form a XML
document collection. We directly use frequent subtree mining
techniques in the discovery process and do not ignore the tree
structure of data in the final rules. The frequent subtrees based on the
user provided support are split to complement subtrees to form the
rules. We explain our model within multi-steps from data preparation
to rule generation.
Abstract: One important problem in today organizations is the
existence of non-integrated information systems, inconsistency and
lack of suitable correlations between legacy and modern systems.
One main solution is to transfer the local databases into a global one.
In this regards we need to extract the data structures from the legacy
systems and integrate them with the new technology systems. In
legacy systems, huge amounts of a data are stored in legacy
databases. They require particular attention since they need more
efforts to be normalized, reformatted and moved to the modern
database environments. Designing the new integrated (global)
database architecture and applying the reverse engineering requires
data normalization. This paper proposes the use of database reverse
engineering in order to integrate legacy and modern databases in
organizations. The suggested approach consists of methods and
techniques for generating data transformation rules needed for the
data structure normalization.
Abstract: Knowledge discovery from text and ontology learning
are relatively new fields. However their usage is extended in many
fields like Information Retrieval (IR) and its related domains. Human
Plausible Reasoning based (HPR) IR systems for example need a
knowledge base as their underlying system which is currently made
by hand. In this paper we propose an architecture based on ontology
learning methods to automatically generate the needed HPR
knowledge base.
Abstract: XML has become a popular standard for information exchange via web. Each XML document can be presented as a rooted, ordered, labeled tree. The Node label shows the exact position of a node in the original document. Region and Dewey encoding are two famous methods of labeling trees. In this paper, we propose a new insert friendly labeling method named IFDewey based on recently proposed scheme, called Extended Dewey. In Extended Dewey many labels must be modified when a new node is inserted into the XML tree. Our method eliminates this problem by reserving even numbers for future insertion. Numbers generated by Extended Dewey may be even or odd. IFDewey modifies Extended Dewey so that only odd numbers are generated and even numbers can then be used for a much easier insertion of nodes.