Abstract: Distributed database is a collection of logically related databases that cooperate in a transparent manner. Query processing uses a communication network for transmitting data between sites. It refers to one of the challenges in the database world. The development of sophisticated query optimization technology is the reason for the commercial success of database systems, which complexity and cost increase with increasing number of relations in the query. Mariposa, query trading and query trading with processing task-trading strategies developed for autonomous distributed database systems, but they cause high optimization cost because of involvement of all nodes in generating an optimal plan. In this paper, we proposed a modification on the autonomous strategy K-QTPT that make the seller’s nodes with the lowest cost have gradually high priorities to reduce the optimization time. We implement our proposed strategy and present the results and analysis based on those results.
Abstract: Although most of the existing skyline queries algorithms focused basically on querying static points through static databases; with the expanding number of sensors, wireless communications and mobile applications, the demand for continuous skyline queries has increased. Unlike traditional skyline queries which only consider static attributes, continuous skyline queries include dynamic attributes, as well as the static ones. However, as skyline queries computation is based on checking the domination of skyline points over all dimensions, considering both the static and dynamic attributes without separation is required. In this paper, we present an efficient algorithm for computing continuous skyline queries without discriminating between static and dynamic attributes. Our algorithm in brief proceeds as follows: First, it excludes the points which will not be in the initial skyline result; this pruning phase reduces the required number of comparisons. Second, the association between the spatial positions of data points is examined; this phase gives an idea of where changes in the result might occur and consequently enables us to efficiently update the skyline result (continuous update) rather than computing the skyline from scratch. Finally, experimental evaluation is provided which demonstrates the accuracy, performance and efficiency of our algorithm over other existing approaches.
Abstract: Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.
Abstract: Wireless sensor network can be applied to both abominable
and military environments. A primary goal in the design of
wireless sensor networks is lifetime maximization, constrained by
the energy capacity of batteries. One well-known method to reduce
energy consumption in such networks is data aggregation. Providing
efcient data aggregation while preserving data privacy is a challenging
problem in wireless sensor networks research. In this paper,
we present privacy-preserving data aggregation scheme for additive
aggregation functions. The Cluster-based Private Data Aggregation
(CPDA)leverages clustering protocol and algebraic properties of
polynomials. It has the advantage of incurring less communication
overhead. The goal of our work is to bridge the gap between
collaborative data collection by wireless sensor networks and data
privacy. We present simulation results of our schemes and compare
their performance to a typical data aggregation scheme TAG, where
no data privacy protection is provided. Results show the efficacy and
efficiency of our schemes.
Abstract: XML is an important standard of data exchange and
representation. As a mature database system, using relational database
to support XML data may bring some advantages. But storing XML in
relational database has obvious redundancy that wastes disk space,
bandwidth and disk I/O when querying XML data. For the efficiency
of storage and query XML, it is necessary to use compressed XML
data in relational database. In this paper, a compressed relational
database technology supporting XML data is presented. Original
relational storage structure is adaptive to XPath query process. The
compression method keeps this feature. Besides traditional relational
database techniques, additional query process technologies on
compressed relations and for special structure for XML are presented.
In this paper, technologies for XQuery process in compressed
relational database are presented..
Abstract: XML is becoming a de facto standard for online data exchange. Existing XML filtering techniques based on a publish/subscribe model are focused on the highly structured data marked up with XML tags. These techniques are efficient in filtering the documents of data-centric XML but are not effective in filtering the element contents of the document-centric XML. In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to adequately filter element contents using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. We show several performance studies, efficiency and scalability using the multi-query processing time (MQPT).
Abstract: A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance. Therefore, in this paper, we introduce a new approach aimed to solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that 2PO outperform the original algorithms in terms of query processing cost and view maintenance cost.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.
Abstract: In many applications, data is in graph structure, which
can be naturally represented as graph-structured XML. Existing
queries defined on tree-structured and graph-structured XML data
mainly focus on subgraph matching, which can not cover all the
requirements of querying on graph. In this paper, a new kind of
queries, topological query on graph-structured XML is presented.
This kind of queries consider not only the structure of subgraph but
also the topological relationship between subgraphs. With existing
subgraph query processing algorithms, efficient algorithms for topological
query processing are designed. Experimental results show the
efficiency of implementation algorithms.
Abstract: Large-scale systems such as Grids offer
infrastructures for both data distribution and parallel processing. The
use of Grid infrastructures is a more recent issue that is already
impacting the Distributed Database Management System industry. In
DBMS, distributed query processing has emerged as a fundamental
technique for ensuring high performance in distributed databases.
Database placement is particularly important in large-scale systems
because it reduces communication costs and improves resource
usage. In this paper, we propose a dynamic database placement
policy that depends on query patterns and Grid sites capabilities. We
evaluate the performance of the proposed database placement policy
using simulations. The obtained results show that dynamic database
placement can significantly improve the performance of distributed
query processing.
Abstract: Computing and maintaining network structures for efficient
data aggregation incurs high overhead for dynamic events
where the set of nodes sensing an event changes with time. Moreover,
structured approaches are sensitive to the waiting time that is used
by nodes to wait for packets from their children before forwarding
the packet to the sink. An optimal routing and data aggregation
scheme for wireless sensor networks is proposed in this paper. We
propose Tree on DAG (ToD), a semistructured approach that uses
Dynamic Forwarding on an implicitly constructed structure composed
of multiple shortest path trees to support network scalability. The key
principle behind ToD is that adjacent nodes in a graph will have
low stretch in one of these trees in ToD, thus resulting in early
aggregation of packets. Based on simulations on a 2,000-node Mica2-
based network, we conclude that efficient aggregation in large-scale
networks can be achieved by our semistructured approach.
Abstract: Schema matching plays a key role in many different
applications, such as schema integration, data integration, data
warehousing, data transformation, E-commerce, peer-to-peer data
management, ontology matching and integration, semantic Web,
semantic query processing, etc. Manual matching is expensive and
error-prone, so it is therefore important to develop techniques to
automate the schema matching process. In this paper, we present a
solution for XML schema automated matching problem which
produces semantic mappings between corresponding schema
elements of given source and target schemas. This solution
contributed in solving more comprehensively and efficiently XML
schema automated matching problem. Our solution based on
combining linguistic similarity, data type compatibility and structural
similarity of XML schema elements. After describing our solution,
we present experimental results that demonstrate the effectiveness of
this approach.
Abstract: Over the past few years, XML (eXtensible Mark-up
Language) has emerged as the standard for information
representation and data exchange over the Internet. This paper
provides a kick-start for new researches venturing in XML databases
field. We survey the storage representation for XML document,
review the XML query processing and optimization techniques with
respect to the particular storage instance. Various optimization
technologies have been developed to solve the query retrieval and
updating problems. Towards the later year, most researchers
proposed hybrid optimization techniques. Hybrid system opens the
possibility of covering each technology-s weakness by its strengths.
This paper reviews the advantages and limitations of optimization
techniques.
Abstract: A Data Warehouses is a repository of information
integrated from source data. Information stored in data warehouse is
the form of materialized in order to provide the better performance
for answering the queries. Deciding which appropriated views to be
materialized is one of important problem. In order to achieve this
requirement, the constructing search space close to optimal is a
necessary task. It will provide effective result for selecting view to be
materialized. In this paper we have proposed an approach to reoptimize
Multiple View Processing Plan (MVPP) by using global
common subexpressions. The merged queries which have query
processing cost not close to optimal would be rewritten. The
experiment shows that our approach can help to improve the total
query processing cost of MVPP and sum of query processing cost
and materialized view maintenance cost is reduced as well after views
are selected to be materialized.
Abstract: The intermittent connectivity modifies the “always
on" network assumption made by all the distributed query processing
systems. In modern- day systems, the absence of network
connectivity is considered as a fault. Since the last upload, it might
not be feasible to transmit all the data accumulated right away over
the available connection. It is possible that vital information may be
delayed excessively when the less important information takes place
of the vital information. Owing to the restricted and uneven
bandwidth, it is vital that the mobile nodes make the most
advantageous use of the connectivity when it arrives. Hence, in order
to select the data that needs to be transmitted first, some sort of data
prioritization is essential. A continuous query processing system for
intermittently connected mobile networks that comprises of a delaytolerant
continuous query processor distributed across the mobile
hosts has been proposed in this paper. In addition, a mechanism for
prioritizing query results has been designed that guarantees enhanced
accuracy and reduced delay. It is illustrated that our architecture
reduces the client power consumption, increases query efficiency by
the extensive simulation results.
Abstract: A data warehouse (DW) is a system which has value and role for decision-making by querying. Queries to DW are critical regarding to their complexity and length. They often access millions of tuples, and involve joins between relations and aggregations. Materialized views are able to provide the better performance for DW queries. However, these views have maintenance cost, so materialization of all views is not possible. An important challenge of DW environment is materialized view selection because we have to realize the trade-off between performance and view maintenance cost. Therefore, in this paper, we introduce a new approach aimed at solve this challenge based on Two-Phase Optimization (2PO), which is a combination of Simulated Annealing (SA) and Iterative Improvement (II), with the use of Multiple View Processing Plan (MVPP). Our experiments show that our method provides a further improvement in term of query processing cost and view maintenance cost.
Abstract: XML has become a popular standard for information exchange via web. Each XML document can be presented as a rooted, ordered, labeled tree. The Node label shows the exact position of a node in the original document. Region and Dewey encoding are two famous methods of labeling trees. In this paper, we propose a new insert friendly labeling method named IFDewey based on recently proposed scheme, called Extended Dewey. In Extended Dewey many labels must be modified when a new node is inserted into the XML tree. Our method eliminates this problem by reserving even numbers for future insertion. Numbers generated by Extended Dewey may be even or odd. IFDewey modifies Extended Dewey so that only odd numbers are generated and even numbers can then be used for a much easier insertion of nodes.
Abstract: The main idea behind in network aggregation is that,
rather than sending individual data items from sensors to sinks,
multiple data items are aggregated as they are forwarded by the
sensor network. Existing sensor network data aggregation techniques
assume that the nodes are preprogrammed and send data to a central
sink for offline querying and analysis. This approach faces two major
drawbacks. First, the system behavior is preprogrammed and cannot
be modified on the fly. Second, the increased energy wastage due to
the communication overhead will result in decreasing the overall
system lifetime. Thus, energy conservation is of prime consideration
in sensor network protocols in order to maximize the network-s
operational lifetime. In this paper, we give an energy efficient
approach to query processing by implementing new optimization
techniques applied to in-network aggregation. We first discuss earlier
approaches in sensors data management and highlight their
disadvantages. We then present our approach “Energy Efficient
Indexed Aggregation" (EEIA) and evaluate it through several
simulations to prove its efficiency, competence and effectiveness.
Abstract: The data exchanged on the Web are of different nature
from those treated by the classical database management systems;
these data are called semi-structured data since they do not have a
regular and static structure like data found in a relational database;
their schema is dynamic and may contain missing data or types.
Therefore, the needs for developing further techniques and
algorithms to exploit and integrate such data, and extract relevant
information for the user have been raised. In this paper we present
the system OSIX (Osiris based System for Integration of XML
Sources). This system has a Data Warehouse model designed for the
integration of semi-structured data and more precisely for the
integration of XML documents. The architecture of OSIX relies on
the Osiris system, a DL-based model designed for the representation
and management of databases and knowledge bases. Osiris is a viewbased
data model whose indexing system supports semantic query
optimization. We show that the problem of query processing on a
XML source is optimized by the indexing approach proposed by
Osiris.