Abstract: Data Warehousing tools have become very popular and currently many of them have moved to Web-based user interfaces to make it easier to access and use the tools. The next step is to enable these tools to be used within a portal framework. The portal framework consists of pages having several small windows that contain individual data warehouse query results. There are several issues that need to be considered when designing the architecture for a portal enabled data warehouse query tool. Some issues need special techniques that can overcome the limitations that are imposed by the nature of data warehouse queries. Issues such as single sign-on, query result caching and sharing, customization, scheduling and authorization need to be considered. This paper discusses such issues and suggests an architecture to support data warehouse queries within Web portal frameworks.
Abstract: The brain MR imaging-based clinical research and analysis system were specifically built and the development for a large-scale data was targeted. We used the general clinical data available for building large-scale data. Registration period for the selection of the lesion ROI and the region growing algorithm was used and the Mesh-warp algorithm for matching was implemented. The accuracy of the matching errors was modified individually. Also, the large ROI research data can accumulate by our developed compression method. In this way, the correctly decision criteria to the research result was suggested. The experimental groups were age, sex, MR type, patient ID and smoking which can easily be queries. The result data was visualized of the overlapped images by a color table. Its data was calculated by the statistical package. The evaluation for the utilization of this system in the chronic ischemic damage in the area has done from patients with the acute cerebral infarction. This is the cause of neurologic disability index location in the center portion of the lateral ventricle facing. The corona radiate was found in the position. Finally, the system reliability was measured both inter-user and intra-user registering correlation.
Abstract: The paper describes design of an ontology in the
financial domain for mutual funds. The design of this ontology
consists of four steps, namely, specification, knowledge acquisition,
implementation and semantic query. Specification includes a
description of the taxonomy and different types mutual funds and
their scope. Knowledge acquisition involves the information
extraction from heterogeneous resources. Implementation describes
the conceptualization and encoding of this data. Finally, semantic
query permits complex queries to integrated data, mapping of these
database entities to ontological concepts.
Abstract: As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
Abstract: Task of object localization is one of the major
challenges in creating intelligent transportation. Unfortunately, in
densely built-up urban areas, localization based on GPS only
produces a large error, or simply becomes impossible. New
opportunities arise for the localization due to the rapidly emerging
concept of a wireless ad-hoc network. Such network, allows
estimating potential distance between these objects measuring
received signal level and construct a graph of distances in which
nodes are the localization objects, and edges - estimates of the
distances between pairs of nodes. Due to the known coordinates of
individual nodes (anchors), it is possible to determine the location of
all (or part) of the remaining nodes of the graph. Moreover, road
map, available in digital format can provide localization routines
with valuable additional information to narrow node location search.
However, despite abundance of well-known algorithms for solving
the problem of localization and significant research efforts, there are
still many issues that currently are addressed only partially. In this
paper, we propose localization approach based on the graph mapped
distances on the digital road map data basis. In fact, problem is
reduced to distance graph embedding into the graph representing area
geo location data. It makes possible to localize objects, in some cases
even if only one reference point is available. We propose simple
embedding algorithm and sample implementation as spatial queries
over sensor network data stored in spatial database, allowing
employing effectively spatial indexing, optimized spatial search
routines and geometry functions.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.
Abstract: In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.
Abstract: This paper compares the search engine marketing
strategies adopted in China and the Western countries through two illustrative cases, namely, Google and Baidu. Marketers in the West
use search engine optimization (SEO) to rank their sites higher for
queries in Google. Baidu, however, offers paid search placement, or the selling of engine results for particular keywords to the higher
bidders. Whereas Google has been providing innovative services ranging from Google Map to Google Blog, Baidu remains focused on
search services – the one that it does best. The challenges and
opportunities of the Chinese Internet market offered to global entrepreneurs are also discussed in the paper
Abstract: Selecting the data modeling technique for an
information system is determined by the objective of the resultant
data model. Dimensional modeling is the preferred modeling
technique for data destined for data warehouses and data mining,
presenting data models that ease analysis and queries which are in
contrast with entity relationship modeling. The establishment of data
warehouses as components of information system landscapes in
many organizations has subsequently led to the development of
dimensional modeling. This has been significantly more developed
and reported for the commercial database management systems as
compared to the open sources thereby making it less affordable for
those in resource constrained settings. This paper presents
dimensional modeling of HIV patient information using open source
modeling tools. It aims to take advantage of the fact that the most
affected regions by the HIV virus are also heavily resource
constrained (sub-Saharan Africa) whereas having large quantities of
HIV data. Two HIV data source systems were studied to identify
appropriate dimensions and facts these were then modeled using two
open source dimensional modeling tools. Use of open source would
reduce the software costs for dimensional modeling and in turn make
data warehousing and data mining more feasible even for those in
resource constrained settings but with data available.
Abstract: In this paper, a model for an information retrieval
system is proposed which takes into account that knowledge about
documents and information need of users are dynamic. Two
methods are combined, one qualitative or symbolic and the other
quantitative or numeric, which are deemed suitable for many
clustering contexts, data analysis, concept exploring and
knowledge discovery. These two methods may be classified as
inductive learning techniques. In this model, they are introduced to
build “long term" knowledge about past queries and concepts in a
collection of documents. The “long term" knowledge can guide
and assist the user to formulate an initial query and can be
exploited in the process of retrieving relevant information. The
different kinds of knowledge are organized in different points of
view. This may be considered an enrichment of the exploration
level which is coherent with the concept of document/query
structure.
Abstract: This paper presents an information retrieval model on
XML documents based on tree matching. Queries and documents are
represented by extended trees. An extended tree is built starting from
the original tree, with additional weighted virtual links between each
node and its indirect descendants allowing to directly reach each
descendant. Therefore only one level separates between each node
and its indirect descendants. This allows to compare the user query
and the document with flexibility and with respect to the structural
constraints of the query. The content of each node is very important to
decide weither a document element is relevant or not, thus the content
should be taken into account in the retrieval process. We separate
between the structure-based and the content-based retrieval processes.
The content-based score of each node is commonly based on the
well-known Tf × Idf criteria. In this paper, we compare between
this criteria and another one we call Tf × Ief. The comparison
is based on some experiments into a dataset provided by INEX1 to
show the effectiveness of our approach on one hand and those of
both weighting functions on the other.
Abstract: In the last few years, the Semantic Web gained scientific acceptance as a means of relationships identification in knowledge base, widely known by semantic association. Query about complex relationships between entities is a strong requirement for many applications in analytical domains. In bioinformatics for example, it is critical to extract exchanges between proteins. Currently, the widely known result of such queries is to provide paths between connected entities from data graph. However, they do not always give good results while facing the user need by the best association or a set of limited best association, because they only consider all existing paths but ignore the path evaluation. In this paper, we present an approach for supporting association discovery queries. Our proposal includes (i) a query language PmSPRQL which provides a multiparadigm query expressions for association extraction and (ii) some quantification measures making easy the process of association ranking. The originality of our proposal is demonstrated by a performance evaluation of our approach on real world datasets.
Abstract: Multimedia, as it stands now is perhaps the most
diverse and rich culture around the globe. One of the major needs of
Multimedia is to have a single system that enables people to
efficiently search through their multimedia catalogues. Many
Domain Specific Systems and architectures have been proposed but
up till now no generic and complete architecture is proposed. In this
paper, we have suggested a generic architecture for Multimedia
Database. The main strengths of our architecture besides being
generic are Semantic Libraries to reduce semantic gap, levels of
feature extraction for more specific and detailed feature extraction
according to classes defined by prior level, and merging of two types
of queries i.e. text and QBE (Query by Example) for more accurate
yet detailed results.
Abstract: This article describes Uruk, the virtual museum of
Iraq that we developed for visual exploration and retrieval of image
collections. The system largely exploits the loosely-structured
hierarchy of XML documents that provides a useful representation
method to store semi-structured or unstructured data, which does not
easily fit into existing database. The system offers users the
capability to mine and manage the XML-based image collections
through a web-based Graphical User Interface (GUI). Typically, at an
interactive session with the system, the user can browse a visual
structural summary of the XML database in order to select interesting
elements. Using this intermediate result, queries combining structure
and textual references can be composed and presented to the system.
After query evaluation, the full set of answers is presented in a visual
and structured way.
Abstract: The concurrent era is characterised by strengthened interactions among financial markets and increased capital mobility globally. In this frames we examine the effects the international financial integration process has on the European bond markets. We perform a comparative study of the interactions of the European and international bond markets and exploit Cointegration analysis results on the elimination of stochastic trends and the decomposition of the underlying long run equilibria and short run causal relations. Our investigation provides evidence on the relation between the European integration process and that of globalisation, viewed through the bond markets- sector. Additionally the structural formulation applied, offers significant implications of the findings. All in all our analysis offers a number of answers on crucial queries towards the European bond markets integration process.
Abstract: In many applications, data is in graph structure, which
can be naturally represented as graph-structured XML. Existing
queries defined on tree-structured and graph-structured XML data
mainly focus on subgraph matching, which can not cover all the
requirements of querying on graph. In this paper, a new kind of
queries, topological query on graph-structured XML is presented.
This kind of queries consider not only the structure of subgraph but
also the topological relationship between subgraphs. With existing
subgraph query processing algorithms, efficient algorithms for topological
query processing are designed. Experimental results show the
efficiency of implementation algorithms.
Abstract: This paper describes a novel and effective approach to content-based image retrieval (CBIR) that represents each image in the database by a vector of feature values called “Standard deviation of mean vectors of color distribution of rows and columns of images for CBIR". In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. This paper describes the approach that uses contents as feature vector for retrieval of similar images. There are several classes of features that are used to specify queries: colour, texture, shape, spatial layout. Colour features are often easily obtained directly from the pixel intensities. In this paper feature extraction is done for the texture descriptor that is 'variance' and 'Variance of Variances'. First standard deviation of each row and column mean is calculated for R, G, and B planes. These six values are obtained for one image which acts as a feature vector. Secondly we calculate variance of the row and column of R, G and B planes of an image. Then six standard deviations of these variance sequences are calculated to form a feature vector of dimension six. We applied our approach to a database of 300 BMP images. We have determined the capability of automatic indexing by analyzing image content: color and texture as features and by applying a similarity measure Euclidean distance.
Abstract: Fast retrieval of data has been a need of user in any
database application. This paper introduces a buffer based query
optimization technique in which queries are assigned weights
according to their number of execution in a query bank. These
queries and their optimized executed plans are loaded into the buffer
at the start of the database application. For every query the system
searches for a match in the buffer and executes the plan without
creating new plans.
Abstract: The Sensor Network consists of densely deployed
sensor nodes. Energy optimization is one of the most important
aspects of sensor application design. Data acquisition and aggregation
techniques for processing data in-network should be energy efficient.
Due to the cross-layer design, resource-limited and noisy nature
of Wireless Sensor Networks(WSNs), it is challenging to study
the performance of these systems in a realistic setting. In this
paper, we propose optimizing queries by aggregation of data and
data redundancy to reduce energy consumption without requiring
all sensed data and directed diffusion communication paradigm to
achieve power savings, robust communication and processing data
in-network. To estimate the per-node power consumption POWERTossim
mica2 energy model is used, which provides scalable and
accurate results. The performance analysis shows that the proposed
methods overcomes the existing methods in the aspects of energy
consumption in wireless sensor networks.
Abstract: This paper focuses on a novel method for semantic
searching and retrieval of information about learning materials.
Metametadata encapsulate metadata instances by using the properties
and attributes provided by ontologies rather than describing learning
objects. A novel metametadata taxonomy has been developed which
provides the basis for a semantic search engine to extract, match and
map queries to retrieve relevant results. The use of ontological views
is a foundation for viewing the pedagogical content of metadata
extracted from learning objects by using the pedagogical attributes
from the metametadata taxonomy. Using the ontological approach
and metametadata (based on the metametadata taxonomy) we present
a novel semantic searching mechanism.These three strands – the
taxonomy, the ontological views, and the search algorithm – are
incorporated into a novel architecture (OMESCOD) which has been
implemented.