Abstract: The goal of data mining algorithms is to discover
useful information embedded in large databases. One of the most
important data mining problems is discovery of frequently occurring
patterns in sequential data. In a multidimensional sequence each
event depends on more than one dimension. The search space is quite
large and the serial algorithms are not scalable for very large
datasets. To address this, it is necessary to study scalable parallel
implementations of sequence mining algorithms.
In this paper, we present a model for multidimensional sequence
and describe a parallel algorithm based on data parallelism.
Simulation experiments show good load balancing and scalable and
acceptable speedup over different processors and problem sizes and
demonstrate that our approach can works efficiently in a real parallel
computing environment.
Abstract: The Economic factors are leading to the rise of
infrastructures provides software and computing facilities as a
service, known as cloud services or cloud computing. Cloud services
can provide efficiencies for application providers, both by limiting
up-front capital expenses, and by reducing the cost of ownership over
time. Such services are made available in a data center, using shared
commodity hardware for computation and storage. There is a varied
set of cloud services available today, including application services
(salesforce.com), storage services (Amazon S3), compute services
(Google App Engine, Amazon EC2) and data services (Amazon
SimpleDB, Microsoft SQL Server Data Services, Google-s Data
store). These services represent a variety of reformations of data
management architectures, and more are on the horizon.
Abstract: Conception is the primordial part in the realization of
a computer system. Several tools have been used to help inventors to
describe their software. These tools knew a big success in the
relational databases domain since they permit to generate SQL script
modeling the database from an Entity/Association model. However,
with the evolution of the computer domain, the relational databases
proved their limits and object-relational model became used more
and more. Tools of present conception don't support all new concepts
introduced by this model and the syntax of the SQL3 language. We
propose in this paper a tool of help to the conception and
implementation of object-relational databases called «NAVIGTOOLS"
that allows the user to generate script modeling its database
in SQL3 language. This tool bases itself on the Entity/Association
and navigational model for modeling the object-relational databases.
Abstract: Object Relational Databases (ORDB) are complex in
nature than traditional relational databases because they combine the
characteristics of both object oriented concepts and relational
features of conventional databases. Design of an ORDB demands
efficient and quality schema considering the structural, functional
and componential traits. This internal quality of the schema is
assured by metrics that measure the relevant attributes. This is
extended to substantiate the understandability, usability and
reliability of the schema, thus assuring external quality of the
schema. This work institutes a formalization of ORDB metrics;
metric definition, evaluation methodology and the calibration of the
metric. Three ORDB schemas were used to conduct the evaluation
and the formalization of the metrics. The metrics are calibrated using
content and criteria related validity based on the measurability,
consistency and reliability of the metrics. Nominal and summative
scales are derived based on the evaluated metric values and are
standardized. Future works pertaining to ORDB metrics forms the
concluding note.
Abstract: In this paper, a two factor scheme is proposed to
generate cryptographic keys directly from biometric data, which
unlike passwords, are strongly bound to the user. Hash value of the
reference iris code is used as a cryptographic key and its length
depends only on the hash function, being independent of any other
parameter. The entropy of such keys is 94 bits, which is much higher
than any other comparable system. The most important and distinct
feature of this scheme is that it regenerates the reference iris code by
providing a genuine iris sample and the correct user password. Since
iris codes obtained from two images of the same eye are not exactly
the same, error correcting codes (Hadamard code and Reed-Solomon
code) are used to deal with the variability. The scheme proposed here
can be used to provide keys for a cryptographic system and/or for
user authentication. The performance of this system is evaluated on
two publicly available databases for iris biometrics namely CBS and
ICE databases. The operating point of the system (values of False
Acceptance Rate (FAR) and False Rejection Rate (FRR)) can be set
by properly selecting the error correction capacity (ts) of the Reed-
Solomon codes, e.g., on the ICE database, at ts = 15, FAR is 0.096%
and FRR is 0.76%.
Abstract: The vast amount of information hidden in huge
databases has created tremendous interests in the field of data
mining. This paper examines the possibility of using data clustering
techniques in oral medicine to identify functional relationships
between different attributes and classification of similar patient
examinations. Commonly used data clustering algorithms have been
reviewed and as a result several interesting results have been
gathered.
Abstract: Approximate tandem repeats in a genomic sequence are
two or more contiguous, similar copies of a pattern of nucleotides.
They are used in DNA mapping, studying molecular evolution
mechanisms, forensic analysis and research in diagnosis of inherited
diseases. All their functions are still investigated and not well
defined, but increasing biological databases together with tools for
identification of these repeats may lead to discovery of their specific
role or correlation with particular features. This paper presents a new
approach for finding approximate tandem repeats in a given sequence,
where the similarity between consecutive repeats is measured using
the Hamming distance. It is an enhancement of a method for finding
exact tandem repeats in DNA sequences based on the Burrows-
Wheeler transform.
Abstract: The majority of today's IR systems base the IR task on two main processes: indexing and searching. There exists a special group of dynamic IR systems where both processes (indexing and searching) happen simultaneously; such a system discards obsolete information, simultaneously dealing with the insertion of new in¬formation, while still answering user queries. In these dynamic, time critical text document databases, it is often important to modify index structures quickly, as documents arrive. This paper presents a method for dynamization which may be used for this task. Experimental results show that the dynamization process is possible and that it guarantees the response time for the query operation and index actualization.
Abstract: In the current age, retrieval of relevant information
from massive amount of data is a challenging job. Over the years,
precise and relevant retrieval of information has attained high
significance. There is a growing need in the market to build systems,
which can retrieve multimedia information that precisely meets the
user's current needs. In this paper, we have introduced a framework
for refining query results before showing it to the user, using ambient
intelligence, user profile, group profile, user location, time, day, user
device type and extracted features. A prototypic tool was also
developed to demonstrate the efficiency of the proposed approach.
Abstract: The leisure boatbuilding industry has tight profit margins that demand that boats are created to a high quality but with low cost. This requirement means reduced design times combined with increased use of design for production can lead to large benefits. The evolutionary nature of the boatbuilding industry can lead to a large usage of previous vessels in new designs. With the increase in automated tools for concurrent engineering within structural design it is important that these tools can reuse this information while subsequently feeding this to designers. The ability to accurately gather this materials and parts data is also a key component to these tools. This paper therefore aims to develop an architecture made up of neural networks and databases to feed information effectively to the designers based on previous design experience.
Abstract: Databases have become ubiquitous. Almost all IT applications are storing into and retrieving information from databases. Retrieving information from the database requires knowledge of technical languages such as Structured Query Language (SQL). However majority of the users who interact with the databases do not have a technical background and are intimidated by the idea of using languages such as SQL. This has led to the development of a few Natural Language Database Interfaces (NLDBIs). A NLDBI allows the user to query the database in a natural language. This paper highlights on architecture of new NLDBI system, its implementation and discusses on results obtained. In most of the typical NLDBI systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. A natural language query is translated to an equivalent SQL query after processing through various stages. The work has been experimented on primitive database queries with certain constraints.
Abstract: The ability to distinguish missense nucleotide
substitutions that contribute to harmful effect from those that do not
is a difficult problem usually accomplished through functional in
vivo analyses. In this study, instead current biochemical methods, the
effects of missense mutations upon protein structure and function
were assayed by means of computational methods and information
from the databases. For this order, the effects of new missense
mutations in exon 5 of PTEN gene upon protein structure and
function were examined. The gene coding for PTEN was identified
and localized on chromosome region 10q23.3 as the tumor
suppressor gene. The utilization of these methods were shown that
c.319G>A and c.341T>G missense mutations that were recognized in
patients with breast cancer and Cowden disease, could be pathogenic.
This method could be use for analysis of missense mutation in others
genes.
Abstract: Images of human iris contain specular highlights due
to the reflective properties of the cornea. This corneal reflection
causes many errors not only in iris and pupil center estimation but
also to locate iris and pupil boundaries especially for methods that
use active contour. Each iris recognition system has four steps:
Segmentation, Normalization, Encoding and Matching. In order to
address the corneal reflection, a novel reflection removal method is
proposed in this paper. Comparative experiments of two existing
methods for reflection removal method are evaluated on CASIA iris
image databases V3. The experimental results reveal that the
proposed algorithm provides higher performance in reflection
removal.
Abstract: The data exchanged on the Web are of different nature
from those treated by the classical database management systems;
these data are called semi-structured data since they do not have a
regular and static structure like data found in a relational database;
their schema is dynamic and may contain missing data or types.
Therefore, the needs for developing further techniques and
algorithms to exploit and integrate such data, and extract relevant
information for the user have been raised. In this paper we present
the system OSIX (Osiris based System for Integration of XML
Sources). This system has a Data Warehouse model designed for the
integration of semi-structured data and more precisely for the
integration of XML documents. The architecture of OSIX relies on
the Osiris system, a DL-based model designed for the representation
and management of databases and knowledge bases. Osiris is a viewbased
data model whose indexing system supports semantic query
optimization. We show that the problem of query processing on a
XML source is optimized by the indexing approach proposed by
Osiris.
Abstract: Cameron Highlands is a mountainous area subjected
to torrential tropical showers. It extracts 5.8 million liters of water
per day for drinking supply from its rivers at several intake points.
The water quality of rivers in Cameron Highlands, however, has
deteriorated significantly due to land clearing for agriculture,
excessive usage of pesticides and fertilizers as well as construction
activities in rapidly developing urban areas. On the other hand, these
pollution sources known as non-point pollution sources are diverse
and hard to identify and therefore they are difficult to estimate.
Hence, Geographical Information Systems (GIS) was used to provide
an extensive approach to evaluate landuse and other mapping
characteristics to explain the spatial distribution of non-point sources
of contamination in Cameron Highlands. The method to assess
pollution sources has been developed by using Cameron Highlands
Master Plan (2006-2010) for integrating GIS, databases, as well as
pollution loads in the area of study. The results show highest annual
runoff is created by forest, 3.56 × 108 m3/yr followed by urban
development, 1.46 × 108 m3/yr. Furthermore, urban development
causes highest BOD load (1.31 × 106 kgBOD/yr) while agricultural
activities and forest contribute the highest annual loads for
phosphorus (6.91 × 104 kgP/yr) and nitrogen (2.50 × 105 kgN/yr),
respectively. Therefore, best management practices (BMPs) are
suggested to be applied to reduce pollution level in the area.
Abstract: Spatial trends are one of the valuable patterns in geo
databases. They play an important role in data analysis and
knowledge discovery from spatial data. A spatial trend is a regular
change of one or more non spatial attributes when spatially moving
away from a start object. Spatial trend detection is a graph search
problem therefore heuristic methods can be good solution. Artificial
immune system (AIS) is a special method for searching and
optimizing. AIS is a novel evolutionary paradigm inspired by the
biological immune system. The models based on immune system
principles, such as the clonal selection theory, the immune network
model or the negative selection algorithm, have been finding
increasing applications in fields of science and engineering.
In this paper, we develop a novel immunological algorithm based
on clonal selection algorithm (CSA) for spatial trend detection. We
are created neighborhood graph and neighborhood path, then select
spatial trends that their affinity is high for antibody. In an
evolutionary process with artificial immune algorithm, affinity of
low trends is increased with mutation until stop condition is satisfied.
Abstract: The size, complexity and number of databases used
for protein information have caused bioinformatics to lag behind in
adapting to the need to handle this distributed information.
Integrating all the information from different databases into one
database is a challenging problem. Our main research is to develop a
tool which can be used to access and manipulate protein information
from difference databases. In our approach, we have integrated
difference databases such as Swiss-prot, PDB, Interpro, and EMBL
and transformed these databases in flat file format into relational
form using XML and Bioperl. As a result, we showed this tool can
search different sizes of protein information stored in relational
database and the result can be retrieved faster compared to flat file
database. A web based user interface is provided to allow user to
access or search for protein information in the local database.
Abstract: The join dependency provides the basis for obtaining
lossless join decomposition in a classical relational schema. The
existence of Join dependency shows that that the tables always
represent the correct data after being joined. Since the classical
relational databases cannot handle imprecise data, they were
extended to fuzzy relational databases so that uncertain, ambiguous,
imprecise and partially known information can also be stored in
databases in a formal way. However like classical databases, the
fuzzy relational databases also undergoes decomposition during
normalization, the issue of joining the decomposed fuzzy relations
remains intact. Our effort in the present paper is to emphasize on this
issue. In this paper we define fuzzy join dependency in the
framework of type-1 fuzzy relational databases & type-2 fuzzy
relational databases using the concept of fuzzy equality which is
defined using fuzzy functions. We use the fuzzy equi-join operator
for computing the fuzzy equality of two attribute values. We also
discuss the dependency preservation property on execution of this
fuzzy equi- join and derive the necessary condition for the fuzzy
functional dependencies to be preserved on joining the decomposed
fuzzy relations. We also derive the conditions for fuzzy join
dependency to exist in context of both type-1 and type-2 fuzzy
relational databases. We find that unlike the classical relational
databases even the existence of a trivial join dependency does not
ensure lossless join decomposition in type-2 fuzzy relational
databases. Finally we derive the conditions for the fuzzy equality to
be non zero and the qualification of an attribute for fuzzy key.
Abstract: This article outlines conceptualization and
implementation of an intelligent system capable of extracting
knowledge from databases. Use of hybridized features of both the
Rough and Fuzzy Set theory render the developed system flexibility
in dealing with discreet as well as continuous datasets. A raw data set
provided to the system, is initially transformed in a computer legible
format followed by pruning of the data set. The refined data set is
then processed through various Rough Set operators which enable
discovery of parameter relationships and interdependencies. The
discovered knowledge is automatically transformed into a rule base
expressed in Fuzzy terms. Two exemplary cancer repository datasets
(for Breast and Lung Cancer) have been used to test and implement
the proposed framework.
Abstract: In this study, workplace environmental monitoring
systems were established using USN(Ubiquitous Sensor Networks)
and LabVIEW. Although existing direct sampling methods enable
finding accurate values as of the time points of measurement, those
methods are disadvantageous in that continuous management and
supervision are difficult and costs for are high when those methods are
used. Therefore, the efficiency and reliability of workplace
management by supervisors are relatively low when those methods are
used. In this study, systems were established so that information on
workplace environmental factors such as temperatures, humidity and
noises is measured and transmitted to the PC in real time to enable
supervisors to monitor workplaces through LabVIEW on the PC.
When any accidents have occurred in workplaces, supervisors can
immediately respond through the monitoring system and this system
enables integrated workplace management and the prevention of
safety accidents. By introducing these monitoring systems, safety
accidents due to harmful environmental factors in workplaces can be
prevented and these monitoring systems will be also helpful in finding
out the correlation between safety accidents and occupational diseases
by comparing and linking databases established by this monitoring
system with existing statistical data.