Abstract: The paper describes design of an ontology in the
financial domain for mutual funds. The design of this ontology
consists of four steps, namely, specification, knowledge acquisition,
implementation and semantic query. Specification includes a
description of the taxonomy and different types mutual funds and
their scope. Knowledge acquisition involves the information
extraction from heterogeneous resources. Implementation describes
the conceptualization and encoding of this data. Finally, semantic
query permits complex queries to integrated data, mapping of these
database entities to ontological concepts.
Abstract: This paper presents a dominant color descriptor
technique for medical image retrieval. The medical image system
will collect and store into medical database. The purpose of
dominant color descriptor (DCD) technique is to retrieve medical
image and to display similar image using queried image. First, this
technique will search and retrieve medical image based on keyword
entered by user. After image is found, the system will assign this
image as a queried image. DCD technique will calculate the image
value of dominant color. Then, system will search and retrieve again
medical image based on value of dominant color query image.
Finally, the system will display similar images with the queried
image to user. Simple application has been developed and tested
using dominant color descriptor. Result based on experiment
indicates this technique is effective and can be used for medical
image retrieval.
Abstract: This paper presents and evaluates a new classification
method that aims to improve classifiers performances and speed up
their training process. The proposed approach, called labeled
classification, seeks to improve convergence of the BP (Back
propagation) algorithm through the addition of an extra feature
(labels) to all training examples. To classify every new example, tests
will be carried out each label. The simplicity of implementation is the
main advantage of this approach because no modifications are
required in the training algorithms. Therefore, it can be used with
others techniques of acceleration and stabilization. In this work, two
models of the labeled classification are proposed: the LMLP
(Labeled Multi Layered Perceptron) and the LNFC (Labeled Neuro
Fuzzy Classifier). These models are tested using Iris, wine, texture
and human thigh databases to evaluate their performances.
Abstract: A typical definition of the Computer Aided Diagnosis
(CAD), found in literature, can be: A diagnosis made by a radiologist
using the output of a computerized scheme for automated image
analysis as a diagnostic aid. Often it is possible to find the expression
Computer Aided Detection (CAD or CADe): this definition
emphasizes the intent of CAD to support rather than substitute the
human observer in the analysis of radiographic images. In this article
we will illustrate the application of CAD systems and the aim of
these definitions.
Commercially available CAD systems use computerized
algorithms for identifying suspicious regions of interest. In this paper
are described the general CAD systems as an expert system
constituted of the following components: segmentation / detection,
feature extraction, and classification / decision making.
As example, in this work is shown the realization of a Computer-
Aided Detection system that is able to assist the radiologist in
identifying types of mammary tumor lesions. Furthermore this
prototype of station uses a GRID configuration to work on a large
distributed database of digitized mammographic images.
Abstract: Breast skin-line estimation and breast segmentation is an important pre-process in mammogram image processing and computer-aided diagnosis of breast cancer. Limiting the area to be processed into a specific target region in an image would increase the accuracy and efficiency of processing algorithms. In this paper we are presenting a new algorithm for estimating skin-line and breast segmentation using fast marching algorithm. Fast marching is a partial-differential equation based numerical technique to track evolution of interfaces. We have introduced some modifications to the traditional fast marching method, specifically to improve the accuracy of skin-line estimation and breast tissue segmentation. Proposed modifications ensure that the evolving front stops near the desired boundary. We have evaluated the performance of the algorithm by using 100 mammogram images taken from mini-MIAS database. The results obtained from the experimental evaluation indicate that this algorithm explains 98.6% of the ground truth breast region and accuracy of the segmentation is 99.1%. Also this algorithm is capable of partially-extracting nipple when it is available in the profile.
Abstract: In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.
Abstract: All Text processing systems allow their users to
search a pattern of string from a given text. String matching is
fundamental to database and text processing applications. Every text
editor must contain a mechanism to search the current document for
arbitrary strings. Spelling checkers scan an input text for words in the
dictionary and reject any strings that do not match. We store our
information in data bases so that later on we can retrieve the same
and this retrieval can be done by using various string matching
algorithms. This paper is describing a new string matching algorithm
for various applications. A new algorithm has been designed with the
help of Rabin Karp Matcher, to improve string matching process.
Abstract: In the present article, a new method has been developed to enhance the application of equipment monitoring, which in turn results in improving condition-based maintenance economic impact in an automobile parts manufacturing factory. This study also describes how an effective software with a simple database can be utilized to achieve cost-effective improvements in maintenance performance. The most important results of this project are indicated here: 1. 63% reduction in direct and indirect maintenance costs. 2. Creating a proper database to analyse failures. 3. Creating a method to control system performance and develop it to similar systems. 4. Designing a software to analyse database and consequently create technical knowledge to face unusual condition of the system. Moreover, the results of this study have shown that the concept and philosophy of maintenance has not been understood in most Iranian industries. Thus, more investment is strongly required to improve maintenance conditions.
Abstract: Automated discovery of hierarchical structures in
large data sets has been an active research area in the recent past.
This paper focuses on the issue of mining generalized rules with crisp
hierarchical structure using Genetic Programming (GP) approach to
knowledge discovery. The post-processing scheme presented in this
work uses flat rules as initial individuals of GP and discovers
hierarchical structure. Suitable genetic operators are proposed for the
suggested encoding. Based on the Subsumption Matrix(SM), an
appropriate fitness function is suggested. Finally, Hierarchical
Production Rules (HPRs) are generated from the discovered
hierarchy. Experimental results are presented to demonstrate the
performance of the proposed algorithm.
Abstract: Recognizing the increasing importance of using the
Internet to conduct business, this paper looks at some related matters
associated with small businesses making a decision of whether or not
to have a Website and go online. Small businesses in Saudi Arabia
struggle to have this decision. For organizations, to fully go online,
conduct business and provide online information services, they need
to connect their database to the Web. Some issues related to doing
that might be beyond the capabilities of most small businesses in
Saudi Arabia, such as Website management, technical issues and
security concerns. Here we focus on a small business firm in Saudi
Arabia (Case Study), discussing the issues related to going online
decision and the firm's options of what to do and how to do it. The
paper suggested some valuable solutions of connecting databases to
the Web. It also discusses some of the important issues related to
online information services and e-commerce, mainly Web hosting
options and security issues.
Abstract: Computerized lip reading has been one of the most
actively researched areas of computer vision in recent past because
of its crime fighting potential and invariance to acoustic environment.
However, several factors like fast speech, bad pronunciation,
poor illumination, movement of face, moustaches and beards make
lip reading difficult. In present work, we propose a solution for
automatic lip contour tracking and recognizing letters of English
language spoken by speakers using the information available from
lip movements. Level set method is used for tracking lip contour
using a contour velocity model and a feature vector of lip movements
is then obtained. Character recognition is performed using modified
k nearest neighbor algorithm which assigns more weight to nearer
neighbors. The proposed system has been found to have accuracy
of 73.3% for character recognition with speaker lip movements as
the only input and without using any speech recognition system in
parallel. The approach used in this work is found to significantly
solve the purpose of lip reading when size of database is small.
Abstract: Despite the fact that Arabic language is currently one
of the most common languages worldwide, there has been only a
little research on Arabic speech recognition relative to other
languages such as English and Japanese. Generally, digital speech
processing and voice recognition algorithms are of special
importance for designing efficient, accurate, as well as fast automatic
speech recognition systems. However, the speech recognition process
carried out in this paper is divided into three stages as follows: firstly,
the signal is preprocessed to reduce noise effects. After that, the
signal is digitized and hearingized. Consequently, the voice activity
regions are segmented using voice activity detection (VAD)
algorithm. Secondly, features are extracted from the speech signal
using Mel-frequency cepstral coefficients (MFCC) algorithm.
Moreover, delta and acceleration (delta-delta) coefficients have been
added for the reason of improving the recognition accuracy. Finally,
each test word-s features are compared to the training database using
dynamic time warping (DTW) algorithm. Utilizing the best set up
made for all affected parameters to the aforementioned techniques,
the proposed system achieved a recognition rate of about 98.5%
which outperformed other HMM and ANN-based approaches
available in the literature.
Abstract: National Biodiversity Database System (NBIDS) has
been developed for collecting Thai biodiversity data. The goal of this
project is to provide advanced tools for querying, analyzing,
modeling, and visualizing patterns of species distribution for
researchers and scientists. NBIDS data record two types of datasets:
biodiversity data and environmental data. Biodiversity data are
specie presence data and species status. The attributes of biodiversity
data can be further classified into two groups: universal and projectspecific
attributes. Universal attributes are attributes that are common
to all of the records, e.g. X/Y coordinates, year, and collector name.
Project-specific attributes are attributes that are unique to one or a
few projects, e.g., flowering stage. Environmental data include
atmospheric data, hydrology data, soil data, and land cover data
collecting by using GLOBE protocols. We have developed webbased
tools for data entry. Google Earth KML and ArcGIS were used
as tools for map visualization. webMathematica was used for simple
data visualization and also for advanced data analysis and
visualization, e.g., spatial interpolation, and statistical analysis.
NBIDS will be used by park rangers at Khao Nan National Park, and
researchers.
Abstract: Many applications of speech communication and speaker
identification suffer from the problem of co-channel speech. This
paper deals with a multi-resolution dyadic wavelet transform method
for usable segments of co-channel speech detection that could be
processed by a speaker identification system. Evaluation of this
method is performed on TIMIT database referring to the Target to
Interferer Ratio measure. Co-channel speech is constructed by
mixing all possible gender speakers. Results do not show much
difference for different mixtures. For the overall mixtures 95.76% of
usable speech is correctly detected with false alarms of 29.65%.
Abstract: Task of object localization is one of the major
challenges in creating intelligent transportation. Unfortunately, in
densely built-up urban areas, localization based on GPS only
produces a large error, or simply becomes impossible. New
opportunities arise for the localization due to the rapidly emerging
concept of a wireless ad-hoc network. Such network, allows
estimating potential distance between these objects measuring
received signal level and construct a graph of distances in which
nodes are the localization objects, and edges - estimates of the
distances between pairs of nodes. Due to the known coordinates of
individual nodes (anchors), it is possible to determine the location of
all (or part) of the remaining nodes of the graph. Moreover, road
map, available in digital format can provide localization routines
with valuable additional information to narrow node location search.
However, despite abundance of well-known algorithms for solving
the problem of localization and significant research efforts, there are
still many issues that currently are addressed only partially. In this
paper, we propose localization approach based on the graph mapped
distances on the digital road map data basis. In fact, problem is
reduced to distance graph embedding into the graph representing area
geo location data. It makes possible to localize objects, in some cases
even if only one reference point is available. We propose simple
embedding algorithm and sample implementation as spatial queries
over sensor network data stored in spatial database, allowing
employing effectively spatial indexing, optimized spatial search
routines and geometry functions.
Abstract: Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.
Abstract: A state of the art Speaker Identification (SI) system
requires a robust feature extraction unit followed by a speaker
modeling scheme for generalized representation of these features.
Over the years, Mel-Frequency Cepstral Coefficients (MFCC)
modeled on the human auditory system has been used as a standard
acoustic feature set for speech related applications. On a recent
contribution by authors, it has been shown that the Inverted Mel-
Frequency Cepstral Coefficients (IMFCC) is useful feature set for
SI, which contains complementary information present in high
frequency region. This paper introduces the Gaussian shaped filter
(GF) while calculating MFCC and IMFCC in place of typical
triangular shaped bins. The objective is to introduce a higher
amount of correlation between subband outputs. The performances
of both MFCC & IMFCC improve with GF over conventional
triangular filter (TF) based implementation, individually as well as
in combination. With GMM as speaker modeling paradigm, the
performances of proposed GF based MFCC and IMFCC in
individual and fused mode have been verified in two standard
databases YOHO, (Microphone Speech) and POLYCOST
(Telephone Speech) each of which has more than 130 speakers.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.
Abstract: The analysis is mainly concentrating on the knowledge
management literatures productivity trend which subjects as
“knowledge management" in SSCI database. The purpose what the
analysis will propose is to summarize the trend information for
knowledge management researchers since core knowledge will be
concentrated in core categories. The result indicated that the literature
productivity which topic as “knowledge management" is still
increasing extremely and will demonstrate the trend by different
categories including author, country/territory, institution name,
document type, language, publication year, and subject area. Focus on
the right categories, you will catch the core research information. This
implies that the phenomenon "success breeds success" is more
common in higher quality publications.
Abstract: Geographical Information Systems are an integral part
of planning in modern technical systems. Nowadays referred to as
Spatial Decision Support Systems, as they allow synergy database
management systems and models within a single user interface
machine and they are important tools in spatial design for
evaluating policies and programs at all levels of administration.
This work refers to the creation of a Geographical Information
System in the context of a broader research in the area of influence
of an under construction station of the new metro in the Greek
city of Thessaloniki, which included statistical and multivariate
data analysis and diagrammatic representation, mapping and
interpretation of the results.