Non-negative Principal Component Analysis for Face Recognition

Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.

Implicit Authorization Mechanism of Object-Oriented Database

Due to its special data structure and manipulative principle, Object-Oriented Database (OODB) has a particular security protection and authorization methods. This paper first introduces the features of security mechanism about OODB, and then talked about authorization checking process of OODB. Implicit authorization mechanism is based on the subject hierarchies, object hierarchies and access hierarchies of the security authorization modes, and simplifies the authorization mode. In addition, to combine with other authorization mechanisms, implicit authorization can make protection on the authorization of OODB expediently and effectively.

LAYMOD; A Layered and Modular Platform for CAx Collaboration Management and Supporting Product data Integration based on STEP Standard

Nowadays companies strive to survive in a competitive global environment. To speed up product development/modifications, it is suggested to adopt a collaborative product development approach. However, despite the advantages of new IT improvements still many CAx systems work separately and locally. Collaborative design and manufacture requires a product information model that supports related CAx product data models. To solve this problem many solutions are proposed, which the most successful one is adopting the STEP standard as a product data model to develop a collaborative CAx platform. However, the improvement of the STEP-s Application Protocols (APs) over the time, huge number of STEP AP-s and cc-s, the high costs of implementation, costly process for conversion of older CAx software files to the STEP neutral file format; and lack of STEP knowledge, that usually slows down the implementation of the STEP standard in collaborative data exchange, management and integration should be considered. In this paper the requirements for a successful collaborative CAx system is discussed. The STEP standard capability for product data integration and its shortcomings as well as the dominant platforms for supporting CAx collaboration management and product data integration are reviewed. Finally a platform named LAYMOD to fulfil the requirements of CAx collaborative environment and integrating the product data is proposed. The platform is a layered platform to enable global collaboration among different CAx software packages/developers. It also adopts the STEP modular architecture and the XML data structures to enable collaboration between CAx software packages as well as overcoming the STEP standard limitations. The architecture and procedures of LAYMOD platform to manage collaboration and avoid contradicts in product data integration are introduced.

Multi-level Metadata Integration System: XML, RDF and RuleML

Our work is part of the heterogeneous data integration, with the definition of a structural and semantic mediation model. Our aim is to propose architecture for the heterogeneous sources metadata mediation, represented by XML, RDF and RuleML models, providing to the user the metadata transparency. This, by including data structures, of natures fundamentally different, and allowing the decomposition of a query involving multiple sources, to queries specific to these sources, then recompose the result.

Semantic Spatial Objects Data Structure for Spatial Access Method

Modern spatial database management systems require a unique Spatial Access Method (SAM) in order solve complex spatial quires efficiently. In this case the spatial data structure takes a prominent place in the SAM. Inadequate data structure leads forming poor algorithmic choices and forging deficient understandings of algorithm behavior on the spatial database. A key step in developing a better semantic spatial object data structure is to quantify the performance effects of semantic and outlier detections that are not reflected in the previous tree structures (R-Tree and its variants). This paper explores a novel SSRO-Tree on SAM to the Topo-Semantic approach. The paper shows how to identify and handle the semantic spatial objects with outlier objects during page overflow/underflow, using gain/loss metrics. We introduce a new SSRO-Tree algorithm which facilitates the achievement of better performance in practice over algorithms that are superior in the R*-Tree and RO-Tree by considering selection queries.

Choosing R-tree or Quadtree Spatial DataIndexing in One Oracle Spatial Database System to Make Faster Showing Geographical Map in Mobile Geographical Information System Technology

The latest Geographic Information System (GIS) technology makes it possible to administer the spatial components of daily “business object," in the corporate database, and apply suitable geographic analysis efficiently in a desktop-focused application. We can use wireless internet technology for transfer process in spatial data from server to client or vice versa. However, the problem in wireless Internet is system bottlenecks that can make the process of transferring data not efficient. The reason is large amount of spatial data. Optimization in the process of transferring and retrieving data, however, is an essential issue that must be considered. Appropriate decision to choose between R-tree and Quadtree spatial data indexing method can optimize the process. With the rapid proliferation of these databases in the past decade, extensive research has been conducted on the design of efficient data structures to enable fast spatial searching. Commercial database vendors like Oracle have also started implementing these spatial indexing to cater to the large and diverse GIS. This paper focuses on the decisions to choose R-tree and quadtree spatial indexing using Oracle spatial database in mobile GIS application. From our research condition, the result of using Quadtree and R-tree spatial data indexing method in one single spatial database can save the time until 42.5%.

Fuzzy Based Problem-Solution Data Structureas a Data Oriented Model for ABS Controlling

The anti-lock braking systems installed on vehicles for safe and effective braking, are high-order nonlinear and timevariant. Using fuzzy logic controllers increase efficiency of such systems, but impose a high computational complexity as well. The main concept introduced by this paper is reducing computational complexity of fuzzy controllers by deploying problem-solution data structure. Unlike conventional methods that are based on calculations, this approach is based on data oriented modeling.

Efficient Mean Shift Clustering Using Exponential Integral Kernels

This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.

Distributed Splay Suffix Arrays: A New Structure for Distributed String Search

As a structure for processing string problem, suffix array is certainly widely-known and extensively-studied. But if the string access pattern follows the “90/10" rule, suffix array can not take advantage of the fact that we often find something that we have just found. Although the splay tree is an efficient data structure for small documents when the access pattern follows the “90/10" rule, it requires many structures and an excessive amount of pointer manipulations for efficiently processing and searching large documents. In this paper, we propose a new and conceptually powerful data structure, called splay suffix arrays (SSA), for string search. This data structure combines the features of splay tree and suffix arrays into a new approach which is suitable to implementation on both conventional and clustered computers.

Geometric Data Structures and Their Selected Applications

Finding the shortest path between two positions is a fundamental problem in transportation, routing, and communications applications. In robot motion planning, the robot should pass around the obstacles touching none of them, i.e. the goal is to find a collision-free path from a starting to a target position. This task has many specific formulations depending on the shape of obstacles, allowable directions of movements, knowledge of the scene, etc. Research of path planning has yielded many fundamentally different approaches to its solution, mainly based on various decomposition and roadmap methods. In this paper, we show a possible use of visibility graphs in point-to-point motion planning in the Euclidean plane and an alternative approach using Voronoi diagrams that decreases the probability of collisions with obstacles. The second application area, investigated here, is focused on problems of finding minimal networks connecting a set of given points in the plane using either only straight connections between pairs of points (minimum spanning tree) or allowing the addition of auxiliary points to the set to obtain shorter spanning networks (minimum Steiner tree).

Using Automated Database Reverse Engineering for Database Integration

One important problem in today organizations is the existence of non-integrated information systems, inconsistency and lack of suitable correlations between legacy and modern systems. One main solution is to transfer the local databases into a global one. In this regards we need to extract the data structures from the legacy systems and integrate them with the new technology systems. In legacy systems, huge amounts of a data are stored in legacy databases. They require particular attention since they need more efforts to be normalized, reformatted and moved to the modern database environments. Designing the new integrated (global) database architecture and applying the reverse engineering requires data normalization. This paper proposes the use of database reverse engineering in order to integrate legacy and modern databases in organizations. The suggested approach consists of methods and techniques for generating data transformation rules needed for the data structure normalization.

On Methodologies for Analysing Sickness Absence Data: An Insight into a New Method

Sickness absence represents a major economic and social issue. Analysis of sick leave data is a recurrent challenge to analysts because of the complexity of the data structure which is often time dependent, highly skewed and clumped at zero. Ignoring these features to make statistical inference is likely to be inefficient and misguided. Traditional approaches do not address these problems. In this study, we discuss model methodologies in terms of statistical techniques for addressing the difficulties with sick leave data. We also introduce and demonstrate a new method by performing a longitudinal assessment of long-term absenteeism using a large registration dataset as a working example available from the Helsinki Health Study for municipal employees from Finland during the period of 1990-1999. We present a comparative study on model selection and a critical analysis of the temporal trends, the occurrence and degree of long-term sickness absences among municipal employees. The strengths of this working example include the large sample size over a long follow-up period providing strong evidence in supporting of the new model. Our main goal is to propose a way to select an appropriate model and to introduce a new methodology for analysing sickness absence data as well as to demonstrate model applicability to complicated longitudinal data.

Extraction of Data from Web Pages: A Vision Based Approach

With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.

The Comparison of Anchor and Star Schema from a Query Performance Perspective

Today's business environment requires that companies have access to highly relevant information in a matter of seconds. Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by star schemas. Dimensional modeling is already recognized as a leading industry standard in the field of data warehousing although several drawbacks and pitfalls were reported. This paper focuses on the analysis of another data warehouse modeling technique - the anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show information about performance of queries executed on database schemas structured according to principles of each database modeling technique.

Models to Customise Web Service Discovery Result using Static and Dynamic Parameters

This paper presents three models which enable the customisation of Universal Description, Discovery and Integration (UDDI) query results, based on some pre-defined and/or real-time changing parameters. These proposed models detail the requirements, design and techniques which make ranking of Web service discovery results from a service registry possible. Our contribution is two fold: First, we present an extension to the UDDI inquiry capabilities. This enables a private UDDI registry owner to customise or rank the query results, based on its business requirements. Second, our proposal utilises existing technologies and standards which require minimal changes to existing UDDI interfaces or its data structures. We believe these models will serve as valuable reference for enhancing the service discovery methods within a private UDDI registry environment.

The Resource Description Framework (RDF) as a Modern Structure for Medical Data

The amount and heterogeneity of data in biomedical research, notably in interdisciplinary fields, requires new methods for the collection, presentation and analysis of information. Important data from laboratory experiments as well as patient trials are available but come out of distributed resources. The Charité - University Hospital Berlin has established together with the German Research Foundation (DFG) a new information service centre for kidney diseases and transplantation (Open European Nephrology Science Centre - OpEN.SC). Beside a collaborative aspect to create new research groups every single partner or institution of this science information centre making his own data available is allowed to search the whole data pool of the various involved centres. A core task is the implementation of a non-restricting open data structure for the various different data sources. We decided to use a modern RDF model and in a first phase transformed original data coming from the web-based Electronic Patient Record database TBase©.

Making Data Structures and Algorithms more Understandable by Programming Sudoku the Human Way

Data Structures and Algorithms is a module in most Computer Science or Information Technology curricula. It is one of the modules most students identify as being difficult. This paper demonstrates how programming a solution for Sudoku can make abstract concepts more concrete. The paper relates concepts of a typical Data Structures and Algorithms module to a step by step solution for Sudoku in a human type as opposed to a computer oriented solution.

Data Extraction of XML Files using Searching and Indexing Techniques

XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.

On Bounding Jayanti's Distributed Mutual Exclusion Algorithm

Jayanti-s algorithm is one of the best known abortable mutual exclusion algorithms. This work is an attempt to overcome an already known limitation of the algorithm while preserving its all important properties and elegance. The limitation is that the token number used to assign process identification number to new incoming processes is unbounded. We have used a suitably adapted alternative data structure, in order to completely eliminate the use of token number, in the algorithm.

An Efficient Approach to Mining Frequent Itemsets on Data Streams

The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach.