Abstract: Principle component analysis is often combined with
the state-of-art classification algorithms to recognize human faces.
However, principle component analysis can only capture these
features contributing to the global characteristics of data because it is a
global feature selection algorithm. It misses those features
contributing to the local characteristics of data because each principal
component only contains some levels of global characteristics of data.
In this study, we present a novel face recognition approach using
non-negative principal component analysis which is added with the
constraint of non-negative to improve data locality and contribute to
elucidating latent data structures. Experiments are performed on the
Cambridge ORL face database. We demonstrate the strong
performances of the algorithm in recognizing human faces in
comparison with PCA and NREMF approaches.
Abstract: Due to its special data structure and manipulative principle, Object-Oriented Database (OODB) has a particular security protection and authorization methods. This paper first introduces the features of security mechanism about OODB, and then talked about authorization checking process of OODB. Implicit authorization mechanism is based on the subject hierarchies, object hierarchies and access hierarchies of the security authorization modes, and simplifies the authorization mode. In addition, to combine with other authorization mechanisms, implicit authorization can make protection on the authorization of OODB expediently and effectively.
Abstract: Nowadays companies strive to survive in a
competitive global environment. To speed up product
development/modifications, it is suggested to adopt a collaborative
product development approach. However, despite the advantages of
new IT improvements still many CAx systems work separately and
locally. Collaborative design and manufacture requires a product
information model that supports related CAx product data models. To
solve this problem many solutions are proposed, which the most
successful one is adopting the STEP standard as a product data model
to develop a collaborative CAx platform. However, the improvement
of the STEP-s Application Protocols (APs) over the time, huge
number of STEP AP-s and cc-s, the high costs of implementation,
costly process for conversion of older CAx software files to the STEP
neutral file format; and lack of STEP knowledge, that usually slows
down the implementation of the STEP standard in collaborative data
exchange, management and integration should be considered. In this
paper the requirements for a successful collaborative CAx system is
discussed. The STEP standard capability for product data integration
and its shortcomings as well as the dominant platforms for supporting
CAx collaboration management and product data integration are
reviewed. Finally a platform named LAYMOD to fulfil the
requirements of CAx collaborative environment and integrating the
product data is proposed. The platform is a layered platform to enable
global collaboration among different CAx software
packages/developers. It also adopts the STEP modular architecture
and the XML data structures to enable collaboration between CAx
software packages as well as overcoming the STEP standard
limitations. The architecture and procedures of LAYMOD platform
to manage collaboration and avoid contradicts in product data
integration are introduced.
Abstract: Our work is part of the heterogeneous data
integration, with the definition of a structural and semantic mediation
model. Our aim is to propose architecture for the heterogeneous
sources metadata mediation, represented by XML, RDF and RuleML
models, providing to the user the metadata transparency. This, by
including data structures, of natures fundamentally different, and
allowing the decomposition of a query involving multiple sources, to
queries specific to these sources, then recompose the result.
Abstract: Modern spatial database management systems require a unique Spatial Access Method (SAM) in order solve complex spatial quires efficiently. In this case the spatial data structure takes a prominent place in the SAM. Inadequate data structure leads forming poor algorithmic choices and forging deficient understandings of algorithm behavior on the spatial database. A key step in developing a better semantic spatial object data structure is to quantify the performance effects of semantic and outlier detections that are not reflected in the previous tree structures (R-Tree and its variants). This paper explores a novel SSRO-Tree on SAM to the Topo-Semantic approach. The paper shows how to identify and handle the semantic spatial objects with outlier objects during page overflow/underflow, using gain/loss metrics. We introduce a new SSRO-Tree algorithm which facilitates the achievement of better performance in practice over algorithms that are superior in the R*-Tree and RO-Tree by considering selection queries.
Abstract: The latest Geographic Information System (GIS)
technology makes it possible to administer the spatial components of
daily “business object," in the corporate database, and apply suitable
geographic analysis efficiently in a desktop-focused application. We
can use wireless internet technology for transfer process in spatial
data from server to client or vice versa. However, the problem in
wireless Internet is system bottlenecks that can make the process of
transferring data not efficient. The reason is large amount of spatial
data. Optimization in the process of transferring and retrieving data,
however, is an essential issue that must be considered. Appropriate
decision to choose between R-tree and Quadtree spatial data indexing
method can optimize the process. With the rapid proliferation of
these databases in the past decade, extensive research has been
conducted on the design of efficient data structures to enable fast
spatial searching. Commercial database vendors like Oracle have also
started implementing these spatial indexing to cater to the large and
diverse GIS. This paper focuses on the decisions to choose R-tree
and quadtree spatial indexing using Oracle spatial database in mobile
GIS application. From our research condition, the result of using
Quadtree and R-tree spatial data indexing method in one single
spatial database can save the time until 42.5%.
Abstract: The anti-lock braking systems installed on vehicles
for safe and effective braking, are high-order nonlinear and timevariant.
Using fuzzy logic controllers increase efficiency of such
systems, but impose a high computational complexity as well. The
main concept introduced by this paper is reducing computational
complexity of fuzzy controllers by deploying problem-solution data
structure. Unlike conventional methods that are based on
calculations, this approach is based on data oriented modeling.
Abstract: This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.
Abstract: As a structure for processing string problem, suffix
array is certainly widely-known and extensively-studied. But if the
string access pattern follows the “90/10" rule, suffix array can not take
advantage of the fact that we often find something that we have just
found. Although the splay tree is an efficient data structure for small
documents when the access pattern follows the “90/10" rule, it
requires many structures and an excessive amount of pointer
manipulations for efficiently processing and searching large
documents. In this paper, we propose a new and conceptually powerful
data structure, called splay suffix arrays (SSA), for string search. This
data structure combines the features of splay tree and suffix arrays into
a new approach which is suitable to implementation on both
conventional and clustered computers.
Abstract: Finding the shortest path between two positions is a
fundamental problem in transportation, routing, and communications
applications. In robot motion planning, the robot should pass around
the obstacles touching none of them, i.e. the goal is to find a
collision-free path from a starting to a target position. This task has
many specific formulations depending on the shape of obstacles,
allowable directions of movements, knowledge of the scene, etc.
Research of path planning has yielded many fundamentally different
approaches to its solution, mainly based on various decomposition
and roadmap methods. In this paper, we show a possible use of
visibility graphs in point-to-point motion planning in the Euclidean
plane and an alternative approach using Voronoi diagrams that
decreases the probability of collisions with obstacles. The second
application area, investigated here, is focused on problems of finding
minimal networks connecting a set of given points in the plane using
either only straight connections between pairs of points (minimum
spanning tree) or allowing the addition of auxiliary points to the set
to obtain shorter spanning networks (minimum Steiner tree).
Abstract: One important problem in today organizations is the
existence of non-integrated information systems, inconsistency and
lack of suitable correlations between legacy and modern systems.
One main solution is to transfer the local databases into a global one.
In this regards we need to extract the data structures from the legacy
systems and integrate them with the new technology systems. In
legacy systems, huge amounts of a data are stored in legacy
databases. They require particular attention since they need more
efforts to be normalized, reformatted and moved to the modern
database environments. Designing the new integrated (global)
database architecture and applying the reverse engineering requires
data normalization. This paper proposes the use of database reverse
engineering in order to integrate legacy and modern databases in
organizations. The suggested approach consists of methods and
techniques for generating data transformation rules needed for the
data structure normalization.
Abstract: Sickness absence represents a major economic and
social issue. Analysis of sick leave data is a recurrent challenge to analysts because of the complexity of the data structure which is
often time dependent, highly skewed and clumped at zero. Ignoring these features to make statistical inference is likely to be inefficient
and misguided. Traditional approaches do not address these problems. In this study, we discuss model methodologies in terms of statistical techniques for addressing the difficulties with sick leave data. We also introduce and demonstrate a new method by performing a longitudinal assessment of long-term absenteeism using
a large registration dataset as a working example available from the Helsinki Health Study for municipal employees from Finland during the period of 1990-1999. We present a comparative study on model
selection and a critical analysis of the temporal trends, the occurrence
and degree of long-term sickness absences among municipal employees. The strengths of this working example include the large
sample size over a long follow-up period providing strong evidence in supporting of the new model. Our main goal is to propose a way to
select an appropriate model and to introduce a new methodology for analysing sickness absence data as well as to demonstrate model
applicability to complicated longitudinal data.
Abstract: With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, tools for the mining of data regions, data records and data items need to be developed in order to provide value-added services. Currently available automatic techniques to mine data regions from web pages are still unsatisfactory because of their poor performance and tag-dependence. In this paper a novel method to extract data items from the web pages automatically is proposed. It comprises of two steps: (1) Identification and Extraction of the data regions based on visual clues information. (2) Identification of data records and extraction of data items from a data region. For step1, a novel and more effective method is proposed based on visual clues, which finds the data regions formed by all types of tags using visual clues. For step2 a more effective method namely, Extraction of Data Items from web Pages (EDIP), is adopted to mine data items. The EDIP technique is a list-based approach in which the list is a linear data structure. The proposed technique is able to mine the non-contiguous data records and can correctly identify data regions, irrespective of the type of tag in which it is bound. Our experimental results show that the proposed technique performs better than the existing techniques.
Abstract: Today's business environment requires that companies have access to highly relevant information in a matter of seconds.
Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by
star schemas. Dimensional modeling is already recognized as a
leading industry standard in the field of data warehousing although
several drawbacks and pitfalls were reported. This paper focuses on
the analysis of another data warehouse modeling technique - the
anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show
information about performance of queries executed on database
schemas structured according to principles of each database modeling
technique.
Abstract: This paper presents three models which enable the
customisation of Universal Description, Discovery and Integration
(UDDI) query results, based on some pre-defined and/or real-time
changing parameters. These proposed models detail the requirements,
design and techniques which make ranking of Web service discovery
results from a service registry possible. Our contribution is two fold:
First, we present an extension to the UDDI inquiry capabilities. This
enables a private UDDI registry owner to customise or rank the query
results, based on its business requirements. Second, our proposal
utilises existing technologies and standards which require minimal
changes to existing UDDI interfaces or its data structures. We believe
these models will serve as valuable reference for enhancing the
service discovery methods within a private UDDI registry
environment.
Abstract: Data Structures and Algorithms is a module in most
Computer Science or Information Technology curricula. It is one of
the modules most students identify as being difficult. This paper
demonstrates how programming a solution for Sudoku can make
abstract concepts more concrete. The paper relates concepts of a
typical Data Structures and Algorithms module to a step by step
solution for Sudoku in a human type as opposed to a computer
oriented solution.
Abstract: XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query.
Abstract: Jayanti-s algorithm is one of the best known abortable mutual exclusion algorithms. This work is an attempt to overcome an already known limitation of the algorithm while preserving its all important properties and elegance. The limitation is that the token number used to assign process identification number to new incoming processes is unbounded. We have used a suitably adapted alternative data structure, in order to completely eliminate the use of token number, in the algorithm.
Abstract: The increasing importance of data stream arising in a
wide range of advanced applications has led to the extensive study of
mining frequent patterns. Mining data streams poses many new
challenges amongst which are the one-scan nature, the unbounded
memory requirement and the high arrival rate of data streams. In this
paper, we propose a new approach for mining itemsets on data
stream. Our approach SFIDS has been developed based on FIDS
algorithm. The main attempts were to keep some advantages of the
previous approach and resolve some of its drawbacks, and
consequently to improve run time and memory consumption. Our
approach has the following advantages: using a data structure similar
to lattice for keeping frequent itemsets, separating regions from each
other with deleting common nodes that results in a decrease in search
space, memory consumption and run time; and Finally, considering
CPU constraint, with increasing arrival rate of data that result in
overloading system, SFIDS automatically detect this situation and
discard some of unprocessing data. We guarantee that error of results
is bounded to user pre-specified threshold, based on a probability
technique. Final results show that SFIDS algorithm could attain
about 50% run time improvement than FIDS approach.