Testing Database of Information System using Conceptual Modeling

This paper focuses on testing database of existing information system. At the beginning we describe the basic problems of implemented databases, such as data redundancy, poor design of database logical structure or inappropriate data types in columns of database tables. These problems are often the result of incorrect understanding of the primary requirements for a database of an information system. Then we propose an algorithm to compare the conceptual model created from vague requirements for a database with a conceptual model reconstructed from implemented database. An algorithm also suggests steps leading to optimization of implemented database. The proposed algorithm is verified by an implemented prototype. The paper also describes a fuzzy system which works with the vague requirements for a database of an information system, procedure for creating conceptual from vague requirements and an algorithm for reconstructing a conceptual model from implemented database.

Power Forecasting of Photovoltaic Generation

Photovoltaic power generation forecasting is an important task in renewable energy power system planning and operating. This paper explores the application of neural networks (NN) to study the design of photovoltaic power generation forecasting systems for one week ahead using weather databases include the global irradiance, and temperature of Ghardaia city (south of Algeria) using a data acquisition system. Simulations were run and the results are discussed showing that neural networks Technique is capable to decrease the photovoltaic power generation forecasting error.

Automatic Clustering of Gene Ontology by Genetic Algorithm

Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.

A Hybrid Approach for Quantification of Novelty in Rule Discovery

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.

AnQL: A Query Language for Annotation Documents

This paper presents data annotation models at five levels of granularity (database, relation, column, tuple, and cell) of relational data to address the problem of unsuitability of most relational databases to express annotations. These models do not require any structural and schematic changes to the underlying database. These models are also flexible, extensible, customizable, database-neutral, and platform-independent. This paper also presents an SQL-like query language, named Annotation Query Language (AnQL), to query annotation documents. AnQL is simple to understand and exploits the already-existent wide knowledge and skill set of SQL.

Content-based Indoor/Outdoor Video Classification System for a Mobile Platform

Organization of video databases is becoming difficult task as the amount of video content increases. Video classification based on the content of videos can significantly increase the speed of tasks such as browsing and searching for a particular video in a database. In this paper, a content-based videos classification system for the classes indoor and outdoor is presented. The system is intended to be used on a mobile platform with modest resources. The algorithm makes use of the temporal redundancy in videos, which allows using an uncomplicated classification model while still achieving reasonable accuracy. The training and evaluation was done on a video database of 443 videos downloaded from a video sharing service. A total accuracy of 87.36% was achieved.

Face Detection in Color Images using Color Features of Skin

Because of increasing demands for security in today-s society and also due to paying much more attention to machine vision, biometric researches, pattern recognition and data retrieval in color images, face detection has got more application. In this article we present a scientific approach for modeling human skin color, and also offer an algorithm that tries to detect faces within color images by combination of skin features and determined threshold in the model. Proposed model is based on statistical data in different color spaces. Offered algorithm, using some specified color threshold, first, divides image pixels into two groups: skin pixel group and non-skin pixel group and then based on some geometric features of face decides which area belongs to face. Two main results that we received from this research are as follow: first, proposed model can be applied easily on different databases and color spaces to establish proper threshold. Second, our algorithm can adapt itself with runtime condition and its results demonstrate desirable progress in comparison with similar cases.

A Self Adaptive Genetic Based Algorithm for the Identification and Elimination of Bad Data

The identification and elimination of bad measurements is one of the basic functions of a robust state estimator as bad data have the effect of corrupting the results of state estimation according to the popular weighted least squares method. However this is a difficult problem to handle especially when dealing with multiple errors from the interactive conforming type. In this paper, a self adaptive genetic based algorithm is proposed. The algorithm utilizes the results of the classical linearized normal residuals approach to tune the genetic operators thus instead of making a randomized search throughout the whole search space it is more likely to be a directed search thus the optimum solution is obtained at very early stages(maximum of 5 generations). The algorithm utilizes the accumulating databases of already computed cases to reduce the computational burden to minimum. Tests are conducted with reference to the standard IEEE test systems. Test results are very promising.

Local Curvelet Based Classification Using Linear Discriminant Analysis for Face Recognition

In this paper, an efficient local appearance feature extraction method based the multi-resolution Curvelet transform is proposed in order to further enhance the performance of the well known Linear Discriminant Analysis(LDA) method when applied to face recognition. Each face is described by a subset of band filtered images containing block-based Curvelet coefficients. These coefficients characterize the face texture and a set of simple statistical measures allows us to form compact and meaningful feature vectors. The proposed method is compared with some related feature extraction methods such as Principal component analysis (PCA), as well as Linear Discriminant Analysis LDA, and independent component Analysis (ICA). Two different muti-resolution transforms, Wavelet (DWT) and Contourlet, were also compared against the Block Based Curvelet-LDA algorithm. Experimental results on ORL, YALE and FERET face databases convince us that the proposed method provides a better representation of the class information and obtains much higher recognition accuracies.

Bottom Up Text Mining through Hierarchical Document Representation

Most of the existing text mining approaches are proposed, keeping in mind, transaction databases model. Thus, the mined dataset is structured using just one concept: the “transaction", whereas the whole dataset is modeled using the “set" abstract type. In such cases, the structure of the whole dataset and the relationships among the transactions themselves are not modeled and consequently, not considered in the mining process. We believe that taking into account structure properties of hierarchically structured information (e.g. textual document, etc ...) in the mining process, can leads to best results. For this purpose, an hierarchical associations rule mining approach for textual documents is proposed in this paper and the classical set-oriented mining approach is reconsidered profits to a Direct Acyclic Graph (DAG) oriented approach. Natural languages processing techniques are used in order to obtain the DAG structure. Based on this graph model, an hierarchical bottom up algorithm is proposed. The main idea is that each node is mined with its parent node.

2D Spherical Spaces for Face Relighting under Harsh Illumination

In this paper, we propose a robust face relighting technique by using spherical space properties. The proposed method is done for reducing the illumination effects on face recognition. Given a single 2D face image, we relight the face object by extracting the nine spherical harmonic bases and the face spherical illumination coefficients. First, an internal training illumination database is generated by computing face albedo and face normal from 2D images under different lighting conditions. Based on the generated database, we analyze the target face pixels and compare them with the training bootstrap by using pre-generated tiles. In this work, practical real time processing speed and small image size were considered when designing the framework. In contrast to other works, our technique requires no 3D face models for the training process and takes a single 2D image as an input. Experimental results on publicly available databases show that the proposed technique works well under severe lighting conditions with significant improvements on the face recognition rates.

Visualization and Indexing of Spectral Databases

On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.

Density Clustering Based On Radius of Data (DCBRD)

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.

A Framework for Data Mining Based Multi-Agent: An Application to Spatial Data

Data mining is an extraordinarily demanding field referring to extraction of implicit knowledge and relationships, which are not explicitly stored in databases. A wide variety of methods of data mining have been introduced (classification, characterization, generalization...). Each one of these methods includes more than algorithm. A system of data mining implies different user categories,, which mean that the user-s behavior must be a component of the system. The problem at this level is to know which algorithm of which method to employ for an exploratory end, which one for a decisional end, and how can they collaborate and communicate. Agent paradigm presents a new way of conception and realizing of data mining system. The purpose is to combine different algorithms of data mining to prepare elements for decision-makers, benefiting from the possibilities offered by the multi-agent systems. In this paper the agent framework for data mining is introduced, and its overall architecture and functionality are presented. The validation is made on spatial data. Principal results will be presented.

Database Placement on Large-Scale Systems

Large-scale systems such as Grids offer infrastructures for both data distribution and parallel processing. The use of Grid infrastructures is a more recent issue that is already impacting the Distributed Database Management System industry. In DBMS, distributed query processing has emerged as a fundamental technique for ensuring high performance in distributed databases. Database placement is particularly important in large-scale systems because it reduces communication costs and improves resource usage. In this paper, we propose a dynamic database placement policy that depends on query patterns and Grid sites capabilities. We evaluate the performance of the proposed database placement policy using simulations. The obtained results show that dynamic database placement can significantly improve the performance of distributed query processing.

Extended Well-Founded Semantics in Bilattices

One of the most used assumptions in logic programming and deductive databases is the so-called Closed World Assumption (CWA), according to which the atoms that cannot be inferred from the programs are considered to be false (i.e. a pessimistic assumption). One of the most successful semantics of conventional logic programs based on the CWA is the well-founded semantics. However, the CWA is not applicable in all circumstances when information is handled. That is, the well-founded semantics, if conventionally defined, would behave inadequately in different cases. The solution we adopt in this paper is to extend the well-founded semantics in order for it to be based also on other assumptions. The basis of (default) negative information in the well-founded semantics is given by the so-called unfounded sets. We extend this concept by considering optimistic, pessimistic, skeptical and paraconsistent assumptions, used to complete missing information from a program. Our semantics, called extended well-founded semantics, expresses also imperfect information considered to be missing/incomplete, uncertain and/or inconsistent, by using bilattices as multivalued logics. We provide a method of computing the extended well-founded semantics and show that Kripke-Kleene semantics is captured by considering a skeptical assumption. We show also that the complexity of the computation of our semantics is polynomial time.

Developing the Color Temperature Histogram Method for Improving the Content-Based Image Retrieval

This paper proposes a new method for image searches and image indexing in databases with a color temperature histogram. The color temperature histogram can be used for performance improvement of content–based image retrieval by using a combination of color temperature and histogram. The color temperature histogram can be represented by a range of 46 colors. That is more than the color histogram and the dominant color temperature. Moreover, with our method the colors that have the same color temperature can be separated while the dominant color temperature can not. The results showed that the color temperature histogram retrieved an accurate image more often than the dominant color temperature method or color histogram method. This also took less time so the color temperature can be used for indexing and searching for images.

Software Architecture and Support for Patient Tracking Systems in Critical Scenarios

In this work a new platform for mobile-health systems is presented. System target application is providing decision support to rescue corps or military medical personnel in combat areas. Software architecture relies on a distributed client-server system that manages a wireless ad-hoc networks hierarchy in which several different types of client operate. Each client is characterized for different hardware and software requirements. Lower hierarchy levels rely in a network of completely custom devices that store clinical information and patient status and are designed to form an ad-hoc network operating in the 2.4 GHz ISM band and complying with the IEEE 802.15.4 standard (ZigBee). Medical personnel may interact with such devices, that are called MICs (Medical Information Carriers), by means of a PDA (Personal Digital Assistant) or a MDA (Medical Digital Assistant), and transmit the information stored in their local databases as well as issue a service request to the upper hierarchy levels by using IEEE 802.11 a/b/g standard (WiFi). The server acts as a repository that stores both medical evacuation forms and associated events (e.g., a teleconsulting request). All the actors participating in the diagnostic or evacuation process may access asynchronously to such repository and update its content or generate new events. The designed system pretends to optimise and improve information spreading and flow among all the system components with the aim of improving both diagnostic quality and evacuation process.

A Survey on Life Science Database Citation Frequency in Scientific Literatures

There are so many databases of various fields of life sciences available online. To find well-used databases, a survey to measure life science database citation frequency in scientific literatures is done. The survey is done by measuring how many scientific literatures which are available on PubMed Central archive cited a specific life science database. This paper presents and discusses the results of the survey.

Feature-Driven Classification of Musical Styles

In this paper we address the problem of musical style classification, which has a number of applications like indexing in musical databases or automatic composition systems. Starting from MIDI files of real-world improvisations, we extract the melody track and cut it into overlapping segments of equal length. From these fragments, some numerical features are extracted as descriptors of style samples. We show that a standard Bayesian classifier can be conveniently employed to build an effective musical style classifier, once this set of features has been extracted from musical data. Preliminary experimental results show the effectiveness of the developed classifier that represents the first component of a musical audio retrieval system