Abstract: This paper focuses on testing database of existing
information system. At the beginning we describe the basic problems
of implemented databases, such as data redundancy, poor design of
database logical structure or inappropriate data types in columns of
database tables. These problems are often the result of incorrect
understanding of the primary requirements for a database of an
information system. Then we propose an algorithm to compare the
conceptual model created from vague requirements for a database
with a conceptual model reconstructed from implemented database.
An algorithm also suggests steps leading to optimization of
implemented database. The proposed algorithm is verified by an
implemented prototype. The paper also describes a fuzzy system
which works with the vague requirements for a database of an
information system, procedure for creating conceptual from vague
requirements and an algorithm for reconstructing a conceptual model
from implemented database.
Abstract: Photovoltaic power generation forecasting is an
important task in renewable energy power system planning and
operating. This paper explores the application of neural networks
(NN) to study the design of photovoltaic power generation
forecasting systems for one week ahead using weather databases
include the global irradiance, and temperature of Ghardaia city
(south of Algeria) using a data acquisition system. Simulations were
run and the results are discussed showing that neural networks
Technique is capable to decrease the photovoltaic power generation
forecasting error.
Abstract: Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Abstract: Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.
Abstract: This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.
Abstract: Organization of video databases is becoming difficult
task as the amount of video content increases. Video classification
based on the content of videos can significantly increase the speed of
tasks such as browsing and searching for a particular video in a
database. In this paper, a content-based videos classification system
for the classes indoor and outdoor is presented. The system is
intended to be used on a mobile platform with modest resources. The
algorithm makes use of the temporal redundancy in videos, which
allows using an uncomplicated classification model while still
achieving reasonable accuracy. The training and evaluation was done
on a video database of 443 videos downloaded from a video sharing
service. A total accuracy of 87.36% was achieved.
Abstract: Because of increasing demands for security in today-s
society and also due to paying much more attention to machine
vision, biometric researches, pattern recognition and data retrieval in
color images, face detection has got more application. In this article
we present a scientific approach for modeling human skin color, and
also offer an algorithm that tries to detect faces within color images
by combination of skin features and determined threshold in the
model. Proposed model is based on statistical data in different color
spaces. Offered algorithm, using some specified color threshold, first,
divides image pixels into two groups: skin pixel group and non-skin
pixel group and then based on some geometric features of face
decides which area belongs to face.
Two main results that we received from this research are as follow:
first, proposed model can be applied easily on different databases and
color spaces to establish proper threshold. Second, our algorithm can
adapt itself with runtime condition and its results demonstrate
desirable progress in comparison with similar cases.
Abstract: The identification and elimination of bad
measurements is one of the basic functions of a robust state estimator
as bad data have the effect of corrupting the results of state
estimation according to the popular weighted least squares method.
However this is a difficult problem to handle especially when dealing
with multiple errors from the interactive conforming type. In this
paper, a self adaptive genetic based algorithm is proposed. The
algorithm utilizes the results of the classical linearized normal
residuals approach to tune the genetic operators thus instead of
making a randomized search throughout the whole search space it is
more likely to be a directed search thus the optimum solution is
obtained at very early stages(maximum of 5 generations). The
algorithm utilizes the accumulating databases of already computed
cases to reduce the computational burden to minimum. Tests are
conducted with reference to the standard IEEE test systems. Test
results are very promising.
Abstract: In this paper, an efficient local appearance feature
extraction method based the multi-resolution Curvelet transform is
proposed in order to further enhance the performance of the well
known Linear Discriminant Analysis(LDA) method when applied
to face recognition. Each face is described by a subset of band
filtered images containing block-based Curvelet coefficients. These
coefficients characterize the face texture and a set of simple statistical
measures allows us to form compact and meaningful feature vectors.
The proposed method is compared with some related feature extraction
methods such as Principal component analysis (PCA), as well
as Linear Discriminant Analysis LDA, and independent component
Analysis (ICA). Two different muti-resolution transforms, Wavelet
(DWT) and Contourlet, were also compared against the Block Based
Curvelet-LDA algorithm. Experimental results on ORL, YALE and
FERET face databases convince us that the proposed method provides
a better representation of the class information and obtains much
higher recognition accuracies.
Abstract: Most of the existing text mining approaches are
proposed, keeping in mind, transaction databases model. Thus, the
mined dataset is structured using just one concept: the “transaction",
whereas the whole dataset is modeled using the “set" abstract type. In
such cases, the structure of the whole dataset and the relationships
among the transactions themselves are not modeled and
consequently, not considered in the mining process.
We believe that taking into account structure properties of
hierarchically structured information (e.g. textual document, etc ...)
in the mining process, can leads to best results. For this purpose, an
hierarchical associations rule mining approach for textual documents
is proposed in this paper and the classical set-oriented mining
approach is reconsidered profits to a Direct Acyclic Graph (DAG)
oriented approach. Natural languages processing techniques are used
in order to obtain the DAG structure. Based on this graph model, an
hierarchical bottom up algorithm is proposed. The main idea is that
each node is mined with its parent node.
Abstract: In this paper, we propose a robust face relighting
technique by using spherical space properties. The proposed method
is done for reducing the illumination effects on face recognition.
Given a single 2D face image, we relight the face object by
extracting the nine spherical harmonic bases and the face spherical
illumination coefficients. First, an internal training illumination
database is generated by computing face albedo and face normal
from 2D images under different lighting conditions. Based on the
generated database, we analyze the target face pixels and compare
them with the training bootstrap by using pre-generated tiles. In this
work, practical real time processing speed and small image size were
considered when designing the framework. In contrast to other works,
our technique requires no 3D face models for the training process
and takes a single 2D image as an input. Experimental results on
publicly available databases show that the proposed technique works
well under severe lighting conditions with significant improvements
on the face recognition rates.
Abstract: On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.
Abstract: Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.
Abstract: Large-scale systems such as Grids offer
infrastructures for both data distribution and parallel processing. The
use of Grid infrastructures is a more recent issue that is already
impacting the Distributed Database Management System industry. In
DBMS, distributed query processing has emerged as a fundamental
technique for ensuring high performance in distributed databases.
Database placement is particularly important in large-scale systems
because it reduces communication costs and improves resource
usage. In this paper, we propose a dynamic database placement
policy that depends on query patterns and Grid sites capabilities. We
evaluate the performance of the proposed database placement policy
using simulations. The obtained results show that dynamic database
placement can significantly improve the performance of distributed
query processing.
Abstract: One of the most used assumptions in logic programming
and deductive databases is the so-called Closed World Assumption
(CWA), according to which the atoms that cannot be inferred
from the programs are considered to be false (i.e. a pessimistic
assumption). One of the most successful semantics of conventional
logic programs based on the CWA is the well-founded semantics.
However, the CWA is not applicable in all circumstances when
information is handled. That is, the well-founded semantics, if
conventionally defined, would behave inadequately in different cases.
The solution we adopt in this paper is to extend the well-founded
semantics in order for it to be based also on other assumptions. The
basis of (default) negative information in the well-founded semantics
is given by the so-called unfounded sets. We extend this concept
by considering optimistic, pessimistic, skeptical and paraconsistent
assumptions, used to complete missing information from a program.
Our semantics, called extended well-founded semantics, expresses
also imperfect information considered to be missing/incomplete,
uncertain and/or inconsistent, by using bilattices as multivalued
logics. We provide a method of computing the extended well-founded
semantics and show that Kripke-Kleene semantics is captured by
considering a skeptical assumption. We show also that the complexity
of the computation of our semantics is polynomial time.
Abstract: This paper proposes a new method for image searches and image indexing in databases with a color temperature histogram. The color temperature histogram can be used for performance improvement of content–based image retrieval by using a combination of color temperature and histogram. The color temperature histogram can be represented by a range of 46 colors. That is more than the color histogram and the dominant color temperature. Moreover, with our method the colors that have the same color temperature can be separated while the dominant color temperature can not. The results showed that the color temperature histogram retrieved an accurate image more often than the dominant color temperature method or color histogram method. This also took less time so the color temperature can be used for indexing and searching for images.
Abstract: In this work a new platform for mobile-health systems is
presented. System target application is providing decision support to
rescue corps or military medical personnel in combat areas. Software
architecture relies on a distributed client-server system that manages a
wireless ad-hoc networks hierarchy in which several different types of
client operate. Each client is characterized for different hardware and
software requirements. Lower hierarchy levels rely in a network of
completely custom devices that store clinical information and patient
status and are designed to form an ad-hoc network operating in the
2.4 GHz ISM band and complying with the IEEE 802.15.4 standard
(ZigBee). Medical personnel may interact with such devices, that are
called MICs (Medical Information Carriers), by means of a PDA
(Personal Digital Assistant) or a MDA (Medical Digital Assistant),
and transmit the information stored in their local databases as well as
issue a service request to the upper hierarchy levels by using IEEE
802.11 a/b/g standard (WiFi). The server acts as a repository that
stores both medical evacuation forms and associated events (e.g., a
teleconsulting request). All the actors participating in the diagnostic
or evacuation process may access asynchronously to such repository
and update its content or generate new events. The designed system
pretends to optimise and improve information spreading and flow
among all the system components with the aim of improving both
diagnostic quality and evacuation process.
Abstract: There are so many databases of various fields of life sciences available online. To find well-used databases, a survey to measure life science database citation frequency in scientific literatures is done. The survey is done by measuring how many scientific literatures which are available on PubMed Central archive cited a specific life science database. This paper presents and discusses the results of the survey.
Abstract: In this paper we address the problem of musical style
classification, which has a number of applications like indexing in
musical databases or automatic composition systems. Starting from
MIDI files of real-world improvisations, we extract the melody track
and cut it into overlapping segments of equal length. From these
fragments, some numerical features are extracted as descriptors of
style samples. We show that a standard Bayesian classifier can be
conveniently employed to build an effective musical style classifier,
once this set of features has been extracted from musical data.
Preliminary experimental results show the effectiveness of the
developed classifier that represents the first component of a musical
audio retrieval system