Abstract: This paper presents an approach for early breast
cancer diagnostic by employing combination of artificial neural
networks (ANN) and multiwaveletpacket based subband image
decomposition. The microcalcifications correspond to high-frequency
components of the image spectrum, detection of microcalcifications
is achieved by decomposing the mammograms into different
frequency subbands,, reconstructing the mammograms from the
subbands containing only high frequencies. For this approach we
employed different types of multiwaveletpacket. We used the result
as an input of neural network for classification. The proposed
methodology is tested using the Nijmegen and the Mammographic
Image Analysis Society (MIAS) mammographic databases and
images collected from local hospitals. Results are presented as the
receiver operating characteristic (ROC) performance and are
quantified by the area under the ROC curve.
Abstract: Data mining can be called as a technique to extract
information from data. It is the process of obtaining hidden
information and then turning it into qualified knowledge by statistical
and artificial intelligence technique. One of its application areas is
medical area to form decision support systems for diagnosis just by
inventing meaningful information from given medical data. In this
study a decision support system for diagnosis of illness that make use
of data mining and three different artificial intelligence classifier
algorithms namely Multilayer Perceptron, Naive Bayes Classifier and
J.48. Pima Indian dataset of UCI Machine Learning Repository was
used. This dataset includes urinary and blood test results of 768
patients. These test results consist of 8 different feature vectors.
Obtained classifying results were compared with the previous studies.
The suggestions for future studies were presented.
Abstract: Trends in business intelligence, e-commerce and
remote access make it necessary and practical to store data in
different ways on multiple systems with different operating systems.
As business evolve and grow, they require efficient computerized
solution to perform data update and to access data from diverse
enterprise business applications. The objective of this paper is to
demonstrate the capability of DTS [1] as a database solution for
automatic data transfer and update in solving business problem. This
DTS package is developed for the sales of variety of plants and
eventually expanded into commercial supply and landscaping
business. Dimension data modeling is used in DTS package to
extract, transform and load data from heterogeneous database
systems such as MySQL, Microsoft Access and Oracle that
consolidates into a Data Mart residing in SQL Server. Hence, the
data transfer from various databases is scheduled to run automatically
every quarter of the year to review the efficient sales analysis.
Therefore, DTS is absolutely an attractive solution for automatic data
transfer and update which meeting today-s business needs.
Abstract: In an era of knowledge explosion, the growth of data
increases rapidly day by day. Since data storage is a limited resource,
how to reduce the data space in the process becomes a challenge issue.
Data compression provides a good solution which can lower the
required space. Data mining has many useful applications in recent
years because it can help users discover interesting knowledge in large
databases. However, existing compression algorithms are not
appropriate for data mining. In [1, 2], two different approaches were
proposed to compress databases and then perform the data mining
process. However, they all lack the ability to decompress the data to
their original state and improve the data mining performance. In this
research a new approach called Mining Merged Transactions with the
Quantification Table (M2TQT) was proposed to solve these problems.
M2TQT uses the relationship of transactions to merge related
transactions and builds a quantification table to prune the candidate
itemsets which are impossible to become frequent in order to improve
the performance of mining association rules. The experiments show
that M2TQT performs better than existing approaches.
Abstract: The volume of XML data exchange is explosively
increasing, and the need for efficient mechanisms of XML data
management is vital. Many XML storage models have been proposed
for storing XML DTD-independent documents in relational database
systems. Benchmarking is the best way to highlight pros and cons of
different approaches. In this study, we use a common benchmarking
scheme, known as XMark to compare the most cited and newly
proposed DTD-independent methods in terms of logical reads,
physical I/O, CPU time and duration. We show the effect of Label
Path, extracting values and storing in another table and type of join
needed for each method-s query answering.
Abstract: Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Abstract: Efficient retrieval of multimedia objects has gained enormous focus in recent years. A number of techniques have been suggested for retrieval of textual information; however, relatively little has been suggested for efficient retrieval of multimedia objects. In this paper we have proposed a generic architecture for contextaware retrieval of multimedia objects. The proposed framework combines the well-known approaches of text-based retrieval and context-aware retrieval to formulate architecture for accurate retrieval of multimedia data.
Abstract: XML data consists of a very flexible tree-structure
which makes it difficult to support the storing and retrieving of XML
data. The node numbering scheme is one of the most popular
approaches to store XML in relational databases. Together with the
node numbering storage scheme, structural joins can be used to
efficiently process the hierarchical relationships in XML. However, in
order to process a tree-structured XPath query containing several
hierarchical relationships and conditional sentences on XML data,
many structural joins need to be carried out, which results in a high
query execution cost. This paper introduces mechanisms to reduce the
XPath queries including branch nodes into a much more efficient form
with less numbers of structural joins. A two step approach is proposed.
The first step merges duplicate nodes in the tree-structured query and
the second step divides the query into sub-queries, shortens the paths
and then merges the sub-queries back together. The proposed
approach can highly contribute to the efficient execution of XML
queries. Experimental results show that the proposed scheme can
reduce the query execution cost by up to an order of magnitude of the
original execution cost.
Abstract: With increasing data in medical databases, medical
data retrieval is growing in popularity. Some of this analysis
including inducing propositional rules from databases using many
soft techniques, and then using these rules in an expert system.
Diagnostic rules and information on features are extracted from
clinical databases on diseases of congenital anomaly. This paper
explain the latest soft computing techniques and some of the
adaptive techniques encompasses an extensive group of methods
that have been applied in the medical domain and that are used for
the discovery of data dependencies, importance of features,
patterns in sample data, and feature space dimensionality
reduction. These approaches pave the way for new and interesting
avenues of research in medical imaging and represent an important
challenge for researchers.
Abstract: As a result of the daily workflow in the design
development departments of companies, databases containing huge
numbers of 3D geometric models are generated. According to the
given problem engineers create CAD drawings based on their design
ideas and evaluate the performance of the resulting design, e.g. by
computational simulations. Usually, new geometries are built either
by utilizing and modifying sets of existing components or by adding
single newly designed parts to a more complex design.
The present paper addresses the two facets of acquiring
components from large design databases automatically and providing
a reasonable overview of the parts to the engineer. A unified
framework based on the topographic non-negative matrix
factorization (TNMF) is proposed which solves both aspects
simultaneously. First, on a given database meaningful components
are extracted into a parts-based representation in an unsupervised
manner. Second, the extracted components are organized and
visualized on square-lattice 2D maps. It is shown on the example of
turbine-like geometries that these maps efficiently provide a wellstructured
overview on the database content and, at the same time,
define a measure for spatial similarity allowing an easy access and
reuse of components in the process of design development.
Abstract: Content-Based Image Retrieval (CBIR) has been
one on the most vivid research areas in the field of computer vision
over the last 10 years. Many programs and tools have been
developed to formulate and execute queries based on the visual or
audio content and to help browsing large multimedia repositories.
Still, no general breakthrough has been achieved with respect to
large varied databases with documents of difering sorts and with
varying characteristics. Answers to many questions with respect to
speed, semantic descriptors or objective image interpretations are
still unanswered. In the medical field, images, and especially
digital images, are produced in ever increasing quantities and used
for diagnostics and therapy. In several articles, content based
access to medical images for supporting clinical decision making
has been proposed that would ease the management of clinical data
and scenarios for the integration of content-based access methods
into Picture Archiving and Communication Systems (PACS) have
been created. This paper gives an overview of soft computing
techniques. New research directions are being defined that can
prove to be useful. Still, there are very few systems that seem to be
used in clinical practice. It needs to be stated as well that the goal
is not, in general, to replace text based retrieval methods as they
exist at the moment.
Abstract: In this paper, we propose an approach for the classification of fingerprint databases. It is based on the fact that a fingerprint image is composed of regular texture regions that can be successfully represented by co-occurrence matrices. So, we first extract the features based on certain characteristics of the cooccurrence matrix and then we use these features to train a neural network for classifying fingerprints into four common classes. The obtained results compared with the existing approaches demonstrate the superior performance of our proposed approach.
Abstract: Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information. NMRShiftDB is one of the main databases that used to represent the chemical structures in 2D or 3D structures. SMILES format is one of many ways to write a chemical structure in a linear format. In this study we extracted Antimicrobial Structures in SMILES format from NMRShiftDB and stored it in our Local Data Warehouse with its corresponding information. Additionally, we developed a searching tool that would response to user-s query using the JME Editor tool that allows user to draw or edit molecules and converts the drawn structure into SMILES format. We applied Quick Search algorithm to search for Antimicrobial Structures in our Local Data Ware House.
Abstract: The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.
Abstract: In illumination variant face recognition, existing
methods extracting face albedo as light normalized image may lead to
loss of extensive facial details, with light template discarded. To
improve that, a novel approach for realistic facial texture
reconstruction by combining original image and albedo image is
proposed. First, light subspaces of different identities are established
from the given reference face images; then by projecting the original
and albedo image into each light subspace respectively, texture
reference images with corresponding lighting are reconstructed and
two texture subspaces are formed. According to the projections in
texture subspaces, facial texture with normal light can be synthesized.
Due to the combination of original image, facial details can be
preserved with face albedo. In addition, image partition is applied to
improve the synthesization performance. Experiments on Yale B and
CMUPIE databases demonstrate that this algorithm outperforms the
others both in image representation and in face recognition.
Abstract: The problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items between transactions computed instead of creating itemset and computing their frequency. With applying real life transactions and some consumption is taken from real life data, the significant efficiency acquire from databases in generation association rules mining.
Abstract: In this paper a new approach to face recognition is presented that achieves double dimension reduction making the system computationally efficient with better recognition results. In pattern recognition techniques, discriminative information of image increases with increase in resolution to a certain extent, consequently face recognition results improve with increase in face image resolution and levels off when arriving at a certain resolution level. In the proposed model of face recognition, first image decimation algorithm is applied on face image for dimension reduction to a certain resolution level which provides best recognition results. Due to better computational speed and feature extraction potential of Discrete Cosine Transform (DCT) it is applied on face image. A subset of coefficients of DCT from low to mid frequencies that represent the face adequately and provides best recognition results is retained. A trade of between decimation factor, number of DCT coefficients retained and recognition rate with minimum computation is obtained. Preprocessing of the image is carried out to increase its robustness against variations in poses and illumination level. This new model has been tested on different databases which include ORL database, Yale database and a color database. The proposed technique has performed much better compared to other techniques. The significance of the model is two fold: (1) dimension reduction up to an effective and suitable face image resolution (2) appropriate DCT coefficients are retained to achieve best recognition results with varying image poses, intensity and illumination level.
Abstract: In face recognition, feature extraction techniques
attempts to search for appropriate representation of the data. However,
when the feature dimension is larger than the samples size, it brings
performance degradation. Hence, we propose a method called
Normalization Discriminant Independent Component Analysis
(NDICA). The input data will be regularized to obtain the most
reliable features from the data and processed using Independent
Component Analysis (ICA). The proposed method is evaluated on
three face databases, Olivetti Research Ltd (ORL), Face Recognition
Technology (FERET) and Face Recognition Grand Challenge
(FRGC). NDICA showed it effectiveness compared with other
unsupervised and supervised techniques.
Abstract: Proteomics is one of the largest areas of research for
bioinformatics and medical science. An ambitious goal of proteomics
is to elucidate the structure, interactions and functions of all proteins
within cells and organisms. Predicting Protein-Protein Interaction
(PPI) is one of the crucial and decisive problems in current research.
Genomic data offer a great opportunity and at the same time a lot of
challenges for the identification of these interactions. Many methods
have already been proposed in this regard. In case of in-silico
identification, most of the methods require both positive and negative
examples of protein interaction and the perfection of these examples
are very much crucial for the final prediction accuracy. Positive
examples are relatively easy to obtain from well known databases. But
the generation of negative examples is not a trivial task. Current PPI
identification methods generate negative examples based on some
assumptions, which are likely to affect their prediction accuracy.
Hence, if more reliable negative examples are used, the PPI prediction
methods may achieve even more accuracy. Focusing on this issue, a
graph based negative example generation method is proposed, which
is simple and more accurate than the existing approaches. An
interaction graph of the protein sequences is created. The basic
assumption is that the longer the shortest path between two
protein-sequences in the interaction graph, the less is the possibility of
their interaction. A well established PPI detection algorithm is
employed with our negative examples and in most cases it increases
the accuracy more than 10% in comparison with the negative pair
selection method in that paper.
Abstract: The development of Artificial Neural Networks
(ANNs) is usually a slow process in which the human expert has to
test several architectures until he finds the one that achieves best
results to solve a certain problem. This work presents a new
technique that uses Genetic Programming (GP) for automatically
generating ANNs. To do this, the GP algorithm had to be changed in
order to work with graph structures, so ANNs can be developed. This
technique also allows the obtaining of simplified networks that solve
the problem with a small group of neurons. In order to measure the
performance of the system and to compare the results with other
ANN development methods by means of Evolutionary Computation
(EC) techniques, several tests were performed with problems based
on some of the most used test databases. The results of those
comparisons show that the system achieves good results comparable
with the already existing techniques and, in most of the cases, they
worked better than those techniques.