Abstract: The volume of XML data exchange is explosively
increasing, and the need for efficient mechanisms of XML data
management is vital. Many XML storage models have been proposed
for storing XML DTD-independent documents in relational database
systems. Benchmarking is the best way to highlight pros and cons of
different approaches. In this study, we use a common benchmarking
scheme, known as XMark to compare the most cited and newly
proposed DTD-independent methods in terms of logical reads,
physical I/O, CPU time and duration. We show the effect of Label
Path, extracting values and storing in another table and type of join
needed for each method-s query answering.
Abstract: Mining frequent tree patterns have many useful
applications in XML mining, bioinformatics, network routing, etc.
Most of the frequent subtree mining algorithms (i.e. FREQT,
TreeMiner and CMTreeMiner) use anti-monotone property in the
phase of candidate subtree generation. However, none of these
algorithms have verified the correctness of this property in tree
structured data. In this research it is shown that anti-monotonicity
does not generally hold, when using weighed support in tree pattern
discovery. As a result, tree mining algorithms that are based on this
property would probably miss some of the valid frequent subtree
patterns in a collection of trees. In this paper, we investigate the
correctness of anti-monotone property for the problem of weighted
frequent subtree mining. In addition we propose W3-Miner, a new
algorithm for full extraction of frequent subtrees. The experimental
results confirm that W3-Miner finds some frequent subtrees that the
previously proposed algorithms are not able to discover.
Abstract: This paper proposes a new technique to detect code
clones from the lexical and syntactic point of view, which is based
on PALEX source code representation. The PALEX code contains
the recorded parsing actions and also lexical formatting information
including white spaces and comments. We can record a list of parsing
actions (shift, reduce, and reading a token) during a compiling process
after a compiler finishes analyzing the source code. The proposed
technique has advantages for syntax sensitive approach and language
independency.
Abstract: The recent trend has been using hybrid approach rather than using a single intelligent technique to solve the problems. In this paper, we describe and discuss a framework to develop enterprise solutions that are backed by intelligent techniques. The framework not only uses intelligent techniques themselves but it is a complete environment that includes various interfaces and components to develop the intelligent solutions. The framework is completely Web-based and uses XML extensively. It can work like shared plat-form to be accessed by multiple developers, users and decision makers.
Abstract: XML data consists of a very flexible tree-structure
which makes it difficult to support the storing and retrieving of XML
data. The node numbering scheme is one of the most popular
approaches to store XML in relational databases. Together with the
node numbering storage scheme, structural joins can be used to
efficiently process the hierarchical relationships in XML. However, in
order to process a tree-structured XPath query containing several
hierarchical relationships and conditional sentences on XML data,
many structural joins need to be carried out, which results in a high
query execution cost. This paper introduces mechanisms to reduce the
XPath queries including branch nodes into a much more efficient form
with less numbers of structural joins. A two step approach is proposed.
The first step merges duplicate nodes in the tree-structured query and
the second step divides the query into sub-queries, shortens the paths
and then merges the sub-queries back together. The proposed
approach can highly contribute to the efficient execution of XML
queries. Experimental results show that the proposed scheme can
reduce the query execution cost by up to an order of magnitude of the
original execution cost.
Abstract: XML is an important standard of data exchange and
representation. As a mature database system, using relational database
to support XML data may bring some advantages. But storing XML in
relational database has obvious redundancy that wastes disk space,
bandwidth and disk I/O when querying XML data. For the efficiency
of storage and query XML, it is necessary to use compressed XML
data in relational database. In this paper, a compressed relational
database technology supporting XML data is presented. Original
relational storage structure is adaptive to XPath query process. The
compression method keeps this feature. Besides traditional relational
database techniques, additional query process technologies on
compressed relations and for special structure for XML are presented.
In this paper, technologies for XQuery process in compressed
relational database are presented..
Abstract: ebXML (Electronic Business using eXtensible
Markup Language) is an e-business standard, sponsored by
UN/CEFACT and OASIS, which enables enterprises to exchange
business messages, conduct trading relationships, communicate
data in common terms and define and register business
processes. While there is tremendous e-business value in the
ebXML, security remains an unsolved problem and one of the
largest barriers to adoption. XML security technologies emerging
recently have extensibility and flexibility suitable for security
implementation such as encryption, digital signature, access
control and authentication.
In this paper, we propose ebXML business transaction models
that allow trading partners to securely exchange XML based
business transactions by employing XML security technologies.
We show how each XML security technology meets the ebXML
standard by constructing the test software and validating messages
between the trading partners.
Abstract: With the tremendous growth of World Wide Web
(WWW) data, there is an emerging need for effective information
retrieval at the document level. Several query languages such as
XML-QL, XPath, XQL, Quilt and XQuery are proposed in recent
years to provide faster way of querying XML data, but they still lack of
generality and efficiency. Our approach towards evolving a framework
for querying semistructured documents is based on formal query
algebra. Two elements are introduced in the proposed framework:
first, a generic and flexible data model for logical representation of
semistructured data and second, a set of operators for the manipulation
of objects defined in the data model. In additional to accommodating
several peculiarities of semistructured data, our model offers novel
features such as bidirectional paths for navigational querying and
partitions for data transformation that are not available in other
proposals.
Abstract: XML is becoming a de facto standard for online data exchange. Existing XML filtering techniques based on a publish/subscribe model are focused on the highly structured data marked up with XML tags. These techniques are efficient in filtering the documents of data-centric XML but are not effective in filtering the element contents of the document-centric XML. In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to adequately filter element contents using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. We show several performance studies, efficiency and scalability using the multi-query processing time (MQPT).
Abstract: The paper focuses on the area of context modeling with respect to the specification of context-aware systems supporting ubiquitous applications. The proposed approach, followed within the SIMPLICITY IST project, uses a high-level system ontology to derive context models for system components which consequently are mapped to the system's physical entities. For the definition of user and device-related context models in particular, the paper suggests a standard-based process consisting of an analysis phase using the Common Information Model (CIM) methodology followed by an implementation phase that defines 3GPP based components. The benefits of this approach are further depicted by preliminary examples of XML grammars defining profiles and components, component instances, coupled with descriptions of respective ubiquitous applications.
Abstract: This paper presents a mark-up approach to service creation in Next Generation Networks. The approach allows deriving added value from network functions exposed by Parlay/OSA (Open Service Access) interfaces. With OSA interfaces service logic scripts might be executed both on callrelated and call-unrelated events. To illustrate the approach XMLbased language constructions for data and method definitions, flow control, time measuring and supervision and database access are given and an example of OSA application is considered.
Abstract: The evolution of current modeling specifications gives rise to the problem of generating automated test cases from a variety of application tools. Past endeavours on behavioural testing of UML statecharts have not systematically leveraged the potential of existing graph theory for testing of objects. Therefore there exists a need for a simple, tool-independent, and effective method for automatic test generation. An architecture, codenamed ACUTE-J (Automated stateChart Unit Testing Engine for Java), for automating the unit test generation process is presented. A sequential approach for converting UML statechart diagrams to JUnit test classes is described, with the application of existing graph theory. Research byproducts such as a universal XML Schema and API for statechart-driven testing are also proposed. The result from a Java implementation of ACUTE-J is discussed in brief. The Chinese Postman algorithm is utilised as an illustration for a run-through of the ACUTE-J architecture.
Abstract: There are many problems associated with the World Wide
Web: getting lost in the hyperspace; the web content is still accessible only
to humans and difficulties of web administration. The solution to these
problems is the Semantic Web which is considered to be the extension
for the current web presents information in both human readable and
machine processable form. The aim of this study is to reach new
generic foundation architecture for the Semantic Web because there
is no clear architecture for it, there are four versions, but still up to
now there is no agreement for one of these versions nor is there a
clear picture for the relation between different layers and
technologies inside this architecture. This can be done depending on
the idea of previous versions as well as Gerber-s evaluation method
as a step toward an agreement for one Semantic Web architecture.
Abstract: Our work is part of the heterogeneous data
integration, with the definition of a structural and semantic mediation
model. Our aim is to propose architecture for the heterogeneous
sources metadata mediation, represented by XML, RDF and RuleML
models, providing to the user the metadata transparency. This, by
including data structures, of natures fundamentally different, and
allowing the decomposition of a query involving multiple sources, to
queries specific to these sources, then recompose the result.
Abstract: Software reuse can be considered as the most realistic
and promising way to improve software engineering productivity and
quality. Automated assistance for software reuse involves the
representation, classification, retrieval and adaptation of components.
The representation and retrieval of components are important to
software reuse in Component-Based on Software Development
(CBSD). However, current industrial component models mainly focus
on the implement techniques and ignore the semantic information
about component, so it is difficult to retrieve the components that
satisfy user-s requirements. This paper presents a method of business
component retrieval based on specification matching to solve the
software reuse of enterprise information system. First, a business
component model oriented reuse is proposed. In our model, the
business data type is represented as sign data type based on XML,
which can express the variable business data type that can describe the
variety of business operations. Based on this model, we propose
specification match relationships in two levels: business operation
level and business component level. In business operation level, we
use input business data types, output business data types and the
taxonomy of business operations evaluate the similarity between
business operations. In the business component level, we propose five
specification matches between business components. To retrieval
reusable business components, we propose the measure of similarity
degrees to calculate the similarities between business components.
Finally, a business component retrieval command like SQL is
proposed to help user to retrieve approximate business components
from component repository.
Abstract: The inherent flexibilities of XML in both structure
and semantics makes mining from XML data a complex task with
more challenges compared to traditional association rule mining in
relational databases. In this paper, we propose a new model for the
effective extraction of generalized association rules form a XML
document collection. We directly use frequent subtree mining
techniques in the discovery process and do not ignore the tree
structure of data in the final rules. The frequent subtrees based on the
user provided support are split to complement subtrees to form the
rules. We explain our model within multi-steps from data preparation
to rule generation.
Abstract: Over the past few years, a number of efforts have
been exerted to build parallel processing systems that utilize the idle
power of LAN-s and PC-s available in many homes and corporations.
The main advantage of these approaches is that they provide cheap
parallel processing environments for those who cannot afford the
expenses of supercomputers and parallel processing hardware.
However, most of the solutions provided are not very flexible in the
use of available resources and very difficult to install and setup.
In this paper, a multi-level web-based parallel processing system
(MWPS) is designed (appendix). MWPS is based on the idea of
volunteer computing, very flexible, easy to setup and easy to use.
MWPS allows three types of subscribers: simple volunteers (single
computers), super volunteers (full networks) and end users. All of
these entities are coordinated transparently through a secure web site.
Volunteer nodes provide the required processing power needed by
the system end users. There is no limit on the number of volunteer
nodes, and accordingly the system can grow indefinitely. Both
volunteer and system users must register and subscribe. Once, they
subscribe, each entity is provided with the appropriate MWPS
components. These components are very easy to install.
Super volunteer nodes are provided with special components that
make it possible to delegate some of the load to their inner nodes.
These inner nodes may also delegate some of the load to some other
lower level inner nodes .... and so on. It is the responsibility of the
parent super nodes to coordinate the delegation process and deliver
the results back to the user.
MWPS uses a simple behavior-based scheduler that takes into
consideration the current load and previous behavior of processing
nodes. Nodes that fulfill their contracts within the expected time get a
high degree of trust. Nodes that fail to satisfy their contract get a
lower degree of trust.
MWPS is based on the .NET framework and provides the minimal
level of security expected in distributed processing environments.
Users and processing nodes are fully authenticated. Communications
and messages between nodes are very secure. The system has been
implemented using C#.
MWPS may be used by any group of people or companies to
establish a parallel processing or grid environment.
Abstract: Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.
Abstract: Modeling product configurations needs large amounts of knowledge about technical and marketing restrictions on the product. Previous attempts to automate product configurations concentrate on representations and management of the knowledge for specific domains in fixed and isolated computing environments. Since the knowledge about product configurations is subject to continuous change and hard to express, these attempts often failed to efficiently manage and exchange the knowledge in collaborative product development. In this paper, XML Topic Map (XTM) is introduced to represent and exchange the knowledge about product configurations in collaborative product development. A product configuration model based on XTM along with its merger and inference facilities enables configuration engineers in collaborative product development to manage and exchange their knowledge efficiently. A prototype implementation is also presented to demonstrate the proposed model can be applied to engineering information systems to exchange the product configuration knowledge.