Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

Cloud Computing Databases: Latest Trends and Architectural Concepts

The Economic factors are leading to the rise of infrastructures provides software and computing facilities as a service, known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses, and by reducing the cost of ownership over time. Such services are made available in a data center, using shared commodity hardware for computation and storage. There is a varied set of cloud services available today, including application services (salesforce.com), storage services (Amazon S3), compute services (Google App Engine, Amazon EC2) and data services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google-s Data store). These services represent a variety of reformations of data management architectures, and more are on the horizon.

New Approach for the Modeling and the Implementation of the Object-Relational Databases

Conception is the primordial part in the realization of a computer system. Several tools have been used to help inventors to describe their software. These tools knew a big success in the relational databases domain since they permit to generate SQL script modeling the database from an Entity/Association model. However, with the evolution of the computer domain, the relational databases proved their limits and object-relational model became used more and more. Tools of present conception don't support all new concepts introduced by this model and the syntax of the SQL3 language. We propose in this paper a tool of help to the conception and implementation of object-relational databases called «NAVIGTOOLS" that allows the user to generate script modeling its database in SQL3 language. This tool bases itself on the Entity/Association and navigational model for modeling the object-relational databases.

A Formal Suite of Object Relational Database Metrics

Object Relational Databases (ORDB) are complex in nature than traditional relational databases because they combine the characteristics of both object oriented concepts and relational features of conventional databases. Design of an ORDB demands efficient and quality schema considering the structural, functional and componential traits. This internal quality of the schema is assured by metrics that measure the relevant attributes. This is extended to substantiate the understandability, usability and reliability of the schema, thus assuring external quality of the schema. This work institutes a formalization of ORDB metrics; metric definition, evaluation methodology and the calibration of the metric. Three ORDB schemas were used to conduct the evaluation and the formalization of the metrics. The metrics are calibrated using content and criteria related validity based on the measurability, consistency and reliability of the metrics. Nominal and summative scales are derived based on the evaluated metric values and are standardized. Future works pertaining to ORDB metrics forms the concluding note.

Application of Biometrics to Obtain High Entropy Cryptographic Keys

In this paper, a two factor scheme is proposed to generate cryptographic keys directly from biometric data, which unlike passwords, are strongly bound to the user. Hash value of the reference iris code is used as a cryptographic key and its length depends only on the hash function, being independent of any other parameter. The entropy of such keys is 94 bits, which is much higher than any other comparable system. The most important and distinct feature of this scheme is that it regenerates the reference iris code by providing a genuine iris sample and the correct user password. Since iris codes obtained from two images of the same eye are not exactly the same, error correcting codes (Hadamard code and Reed-Solomon code) are used to deal with the variability. The scheme proposed here can be used to provide keys for a cryptographic system and/or for user authentication. The performance of this system is evaluated on two publicly available databases for iris biometrics namely CBS and ICE databases. The operating point of the system (values of False Acceptance Rate (FAR) and False Rejection Rate (FRR)) can be set by properly selecting the error correction capacity (ts) of the Reed- Solomon codes, e.g., on the ICE database, at ts = 15, FAR is 0.096% and FRR is 0.76%.

Using Data Clustering in Oral Medicine

The vast amount of information hidden in huge databases has created tremendous interests in the field of data mining. This paper examines the possibility of using data clustering techniques in oral medicine to identify functional relationships between different attributes and classification of similar patient examinations. Commonly used data clustering algorithms have been reviewed and as a result several interesting results have been gathered.

Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform

Approximate tandem repeats in a genomic sequence are two or more contiguous, similar copies of a pattern of nucleotides. They are used in DNA mapping, studying molecular evolution mechanisms, forensic analysis and research in diagnosis of inherited diseases. All their functions are still investigated and not well defined, but increasing biological databases together with tools for identification of these repeats may lead to discovery of their specific role or correlation with particular features. This paper presents a new approach for finding approximate tandem repeats in a given sequence, where the similarity between consecutive repeats is measured using the Hamming distance. It is an enhancement of a method for finding exact tandem repeats in DNA sequences based on the Burrows- Wheeler transform.

Dynamic Inverted Index Maintenance

The majority of today's IR systems base the IR task on two main processes: indexing and searching. There exists a special group of dynamic IR systems where both processes (indexing and searching) happen simultaneously; such a system discards obsolete information, simultaneously dealing with the insertion of new in¬formation, while still answering user queries. In these dynamic, time critical text document databases, it is often important to modify index structures quickly, as documents arrive. This paper presents a method for dynamization which may be used for this task. Experimental results show that the dynamization process is possible and that it guarantees the response time for the query operation and index actualization.

A Frame Work for Query Results Refinement in Multimedia Databases

In the current age, retrieval of relevant information from massive amount of data is a challenging job. Over the years, precise and relevant retrieval of information has attained high significance. There is a growing need in the market to build systems, which can retrieve multimedia information that precisely meets the user's current needs. In this paper, we have introduced a framework for refining query results before showing it to the user, using ambient intelligence, user profile, group profile, user location, time, day, user device type and extracted features. A prototypic tool was also developed to demonstrate the efficiency of the proposed approach.

Design Histories for Enhanced Concurrent Structural Design

The leisure boatbuilding industry has tight profit margins that demand that boats are created to a high quality but with low cost. This requirement means reduced design times combined with increased use of design for production can lead to large benefits. The evolutionary nature of the boatbuilding industry can lead to a large usage of previous vessels in new designs. With the increase in automated tools for concurrent engineering within structural design it is important that these tools can reuse this information while subsequently feeding this to designers. The ability to accurately gather this materials and parts data is also a key component to these tools. This paper therefore aims to develop an architecture made up of neural networks and databases to feed information effectively to the designers based on previous design experience.

Natural Language Database Interface for Selection of Data Using Grammar and Parsing

Databases have become ubiquitous. Almost all IT applications are storing into and retrieving information from databases. Retrieving information from the database requires knowledge of technical languages such as Structured Query Language (SQL). However majority of the users who interact with the databases do not have a technical background and are intimidated by the idea of using languages such as SQL. This has led to the development of a few Natural Language Database Interfaces (NLDBIs). A NLDBI allows the user to query the database in a natural language. This paper highlights on architecture of new NLDBI system, its implementation and discusses on results obtained. In most of the typical NLDBI systems the natural language statement is converted into an internal representation based on the syntactic and semantic knowledge of the natural language. This representation is then converted into queries using a representation converter. A natural language query is translated to an equivalent SQL query after processing through various stages. The work has been experimented on primitive database queries with certain constraints.

Bioinformatics Profiling of Missense Mutations

The ability to distinguish missense nucleotide substitutions that contribute to harmful effect from those that do not is a difficult problem usually accomplished through functional in vivo analyses. In this study, instead current biochemical methods, the effects of missense mutations upon protein structure and function were assayed by means of computational methods and information from the databases. For this order, the effects of new missense mutations in exon 5 of PTEN gene upon protein structure and function were examined. The gene coding for PTEN was identified and localized on chromosome region 10q23.3 as the tumor suppressor gene. The utilization of these methods were shown that c.319G>A and c.341T>G missense mutations that were recognized in patients with breast cancer and Cowden disease, could be pathogenic. This method could be use for analysis of missense mutation in others genes.

New Corneal Reflection Removal Method Used In Iris Recognition System

Images of human iris contain specular highlights due to the reflective properties of the cornea. This corneal reflection causes many errors not only in iris and pupil center estimation but also to locate iris and pupil boundaries especially for methods that use active contour. Each iris recognition system has four steps: Segmentation, Normalization, Encoding and Matching. In order to address the corneal reflection, a novel reflection removal method is proposed in this paper. Comparative experiments of two existing methods for reflection removal method are evaluated on CASIA iris image databases V3. The experimental results reveal that the proposed algorithm provides higher performance in reflection removal.

A Materialized Approach to the Integration of XML Documents: the OSIX System

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

GIS-based Non-point Sources of Pollution Simulation in Cameron Highlands, Malaysia

Cameron Highlands is a mountainous area subjected to torrential tropical showers. It extracts 5.8 million liters of water per day for drinking supply from its rivers at several intake points. The water quality of rivers in Cameron Highlands, however, has deteriorated significantly due to land clearing for agriculture, excessive usage of pesticides and fertilizers as well as construction activities in rapidly developing urban areas. On the other hand, these pollution sources known as non-point pollution sources are diverse and hard to identify and therefore they are difficult to estimate. Hence, Geographical Information Systems (GIS) was used to provide an extensive approach to evaluate landuse and other mapping characteristics to explain the spatial distribution of non-point sources of contamination in Cameron Highlands. The method to assess pollution sources has been developed by using Cameron Highlands Master Plan (2006-2010) for integrating GIS, databases, as well as pollution loads in the area of study. The results show highest annual runoff is created by forest, 3.56 × 108 m3/yr followed by urban development, 1.46 × 108 m3/yr. Furthermore, urban development causes highest BOD load (1.31 × 106 kgBOD/yr) while agricultural activities and forest contribute the highest annual loads for phosphorus (6.91 × 104 kgP/yr) and nitrogen (2.50 × 105 kgN/yr), respectively. Therefore, best management practices (BMPs) are suggested to be applied to reduce pollution level in the area.

Optimizing Spatial Trend Detection By Artificial Immune Systems

Spatial trends are one of the valuable patterns in geo databases. They play an important role in data analysis and knowledge discovery from spatial data. A spatial trend is a regular change of one or more non spatial attributes when spatially moving away from a start object. Spatial trend detection is a graph search problem therefore heuristic methods can be good solution. Artificial immune system (AIS) is a special method for searching and optimizing. AIS is a novel evolutionary paradigm inspired by the biological immune system. The models based on immune system principles, such as the clonal selection theory, the immune network model or the negative selection algorithm, have been finding increasing applications in fields of science and engineering. In this paper, we develop a novel immunological algorithm based on clonal selection algorithm (CSA) for spatial trend detection. We are created neighborhood graph and neighborhood path, then select spatial trends that their affinity is high for antibody. In an evolutionary process with artificial immune algorithm, affinity of low trends is increased with mutation until stop condition is satisfied.

A System to Integrate and Manipulate Protein Database Using BioPerl and XML

The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem. Our main research is to develop a tool which can be used to access and manipulate protein information from difference databases. In our approach, we have integrated difference databases such as Swiss-prot, PDB, Interpro, and EMBL and transformed these databases in flat file format into relational form using XML and Bioperl. As a result, we showed this tool can search different sizes of protein information stored in relational database and the result can be retrieved faster compared to flat file database. A web based user interface is provided to allow user to access or search for protein information in the local database.

Fuzzy Join Dependency in Fuzzy Relational Databases

The join dependency provides the basis for obtaining lossless join decomposition in a classical relational schema. The existence of Join dependency shows that that the tables always represent the correct data after being joined. Since the classical relational databases cannot handle imprecise data, they were extended to fuzzy relational databases so that uncertain, ambiguous, imprecise and partially known information can also be stored in databases in a formal way. However like classical databases, the fuzzy relational databases also undergoes decomposition during normalization, the issue of joining the decomposed fuzzy relations remains intact. Our effort in the present paper is to emphasize on this issue. In this paper we define fuzzy join dependency in the framework of type-1 fuzzy relational databases & type-2 fuzzy relational databases using the concept of fuzzy equality which is defined using fuzzy functions. We use the fuzzy equi-join operator for computing the fuzzy equality of two attribute values. We also discuss the dependency preservation property on execution of this fuzzy equi- join and derive the necessary condition for the fuzzy functional dependencies to be preserved on joining the decomposed fuzzy relations. We also derive the conditions for fuzzy join dependency to exist in context of both type-1 and type-2 fuzzy relational databases. We find that unlike the classical relational databases even the existence of a trivial join dependency does not ensure lossless join decomposition in type-2 fuzzy relational databases. Finally we derive the conditions for the fuzzy equality to be non zero and the qualification of an attribute for fuzzy key.

Automated Knowledge Engineering

This article outlines conceptualization and implementation of an intelligent system capable of extracting knowledge from databases. Use of hybridized features of both the Rough and Fuzzy Set theory render the developed system flexibility in dealing with discreet as well as continuous datasets. A raw data set provided to the system, is initially transformed in a computer legible format followed by pruning of the data set. The refined data set is then processed through various Rough Set operators which enable discovery of parameter relationships and interdependencies. The discovered knowledge is automatically transformed into a rule base expressed in Fuzzy terms. Two exemplary cancer repository datasets (for Breast and Lung Cancer) have been used to test and implement the proposed framework.

Development of Workplace Environmental Monitoring Systems Using Ubiquitous Sensor Network

In this study, workplace environmental monitoring systems were established using USN(Ubiquitous Sensor Networks) and LabVIEW. Although existing direct sampling methods enable finding accurate values as of the time points of measurement, those methods are disadvantageous in that continuous management and supervision are difficult and costs for are high when those methods are used. Therefore, the efficiency and reliability of workplace management by supervisors are relatively low when those methods are used. In this study, systems were established so that information on workplace environmental factors such as temperatures, humidity and noises is measured and transmitted to the PC in real time to enable supervisors to monitor workplaces through LabVIEW on the PC. When any accidents have occurred in workplaces, supervisors can immediately respond through the monitoring system and this system enables integrated workplace management and the prevention of safety accidents. By introducing these monitoring systems, safety accidents due to harmful environmental factors in workplaces can be prevented and these monitoring systems will be also helpful in finding out the correlation between safety accidents and occupational diseases by comparing and linking databases established by this monitoring system with existing statistical data.