Abstract: In order to maximize efficiency of an information management platform and to assist in decision making, the collection, storage and analysis of performance-relevant data has become of fundamental importance. This paper addresses the merits and drawbacks provided by the OLAP paradigm for efficiently navigating large volumes of performance measurement data hierarchically. The system managers or database administrators navigate through adequately (re)structured measurement data aiming to detect performance bottlenecks, identify causes for performance problems or assessing the impact of configuration changes on the system and its representative metrics. Of particular importance is finding the root cause of an imminent problem, threatening availability and performance of an information system. Leveraging OLAP techniques, in contrast to traditional static reporting, this is supposed to be accomplished within moderate amount of time and little processing complexity. It is shown how OLAP techniques can help improve understandability and manageability of measurement data and, hence, improve the whole Performance Analysis process.
Abstract: This paper presents a new technique for detection of
human faces within color images. The approach relies on image
segmentation based on skin color, features extracted from the two-dimensional
discrete cosine transform (DCT), and self-organizing
maps (SOM). After candidate skin regions are extracted, feature
vectors are constructed using DCT coefficients computed from those
regions. A supervised SOM training session is used to cluster feature
vectors into groups, and to assign “face" or “non-face" labels to those
clusters. Evaluation was performed using a new image database of
286 images, containing 1027 faces. After training, our detection
technique achieved a detection rate of 77.94% during subsequent
tests, with a false positive rate of 5.14%. To our knowledge, the
proposed technique is the first to combine DCT-based feature
extraction with a SOM for detecting human faces within color
images. It is also one of a few attempts to combine a feature-invariant
approach, such as color-based skin segmentation, together with
appearance-based face detection. The main advantage of the new
technique is its low computational requirements, in terms of both
processing speed and memory utilization.
Abstract: Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information. NMRShiftDB is one of the main databases that used to represent the chemical structures in 2D or 3D structures. SMILES format is one of many ways to write a chemical structure in a linear format. In this study we extracted Antimicrobial Structures in SMILES format from NMRShiftDB and stored it in our Local Data Warehouse with its corresponding information. Additionally, we developed a searching tool that would response to user-s query using the JME Editor tool that allows user to draw or edit molecules and converts the drawn structure into SMILES format. We applied Quick Search algorithm to search for Antimicrobial Structures in our Local Data Ware House.
Abstract: Dynamic location referencing method is an important technology to shield map differences. These method references objects of the road network by utilizing condensed selection of its real-world geographic properties stored in a digital map database, which overcomes the defections existing in pre-coded location referencing methods. The high attributes completeness requirements and complicated reference point selection algorithm are the main problems of recent researches. Therefore, a dynamic location referencing algorithm combining intersection points selected at the extremities compulsively and road link points selected according to link partition principle was proposed. An experimental system based on this theory was implemented. The tests using Beijing digital map database showed satisfied results and thus verified the feasibility and practicability of this method.
Abstract: The prediction of transmembrane helical segments
(TMHs) in membrane proteins is an important field in the
bioinformatics research. In this paper, a new method based on discrete
wavelet transform (DWT) has been developed to predict the number
and location of TMHs in membrane proteins. PDB coded as 1KQG
was chosen as an example to describe the prediction of the number and
location of TMHs in membrane proteins by using this method. To
access the effect of the method, 80 proteins with known 3D-structure
from Mptopo database are chosen at random as the test objects
(including 325 TMHs), 308 of which can be predicted accurately, the
average predicted accuracy is 96.3%. In addition, the above 80
membrane proteins are divided into 13 groups according to their
function and type. In particular, the results of the prediction of TMHs
of the 13 groups are satisfying.
Abstract: In this paper, a comparative study of application of
supervised and unsupervised learning algorithms on illumination
invariant face recognition has been carried out. The supervised
learning has been carried out with the help of using a bi-layered
artificial neural network having one input, two hidden and one output
layer. The gradient descent with momentum and adaptive learning
rate back propagation learning algorithm has been used to implement
the supervised learning in a way that both the inputs and
corresponding outputs are provided at the time of training the
network, thus here is an inherent clustering and optimized learning of
weights which provide us with efficient results.. The unsupervised
learning has been implemented with the help of a modified
Counterpropagation network. The Counterpropagation network
involves the process of clustering followed by application of Outstar
rule to obtain the recognized face. The face recognition system has
been developed for recognizing faces which have varying
illumination intensities, where the database images vary in lighting
with respect to angle of illumination with horizontal and vertical
planes. The supervised and unsupervised learning algorithms have
been implemented and have been tested exhaustively, with and
without application of histogram equalization to get efficient results.
Abstract: The issue of real-time and reliable report delivery is extremely important for taking effective decision in a real world mission critical Wireless Sensor Network (WSN) based application. The sensor data behaves differently in many ways from the data in traditional databases. WSNs need a mechanism to register, process queries, and disseminate data. In this paper we propose an architectural framework for data placement and management. We propose a reliable and real time approach for data placement and achieving data integrity using self organized sensor clusters. Instead of storing information in individual cluster heads as suggested in some protocols, in our architecture we suggest storing of information of all clusters within a cell in the corresponding base station. For data dissemination and action in the wireless sensor network we propose to use Action and Relay Stations (ARS). To reduce average energy dissipation of sensor nodes, the data is sent to the nearest ARS rather than base station. We have designed our architecture in such a way so as to achieve greater energy savings, enhanced availability and reliability.
Abstract: Real world Speaker Identification (SI) application
differs from ideal or laboratory conditions causing perturbations that
leads to a mismatch between the training and testing environment
and degrade the performance drastically. Many strategies have been
adopted to cope with acoustical degradation; wavelet based Bayesian
marginal model is one of them. But Bayesian marginal models
cannot model the inter-scale statistical dependencies of different
wavelet scales. Simple nonlinear estimators for wavelet based
denoising assume that the wavelet coefficients in different scales are
independent in nature. However wavelet coefficients have significant
inter-scale dependency. This paper enhances this inter-scale
dependency property by a Circularly Symmetric Probability Density
Function (CS-PDF) related to the family of Spherically Invariant
Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain
and corresponding joint shrinkage estimator is derived by Maximum
a Posteriori (MAP) estimator. A framework is proposed based on
these to denoise speech signal for automatic speaker identification
problems. The robustness of the proposed framework is tested for
Text Independent Speaker Identification application on 100 speakers
of POLYCOST and 100 speakers of YOHO speech database in three
different noise environments. Experimental results show that the
proposed estimator yields a higher improvement in identification
accuracy compared to other estimators on popular Gaussian Mixture
Model (GMM) based speaker model and Mel-Frequency Cepstral
Coefficient (MFCC) features.
Abstract: A new, combinatorial model for analyzing and inter-
preting an electrocardiogram (ECG) is presented. An application of
the model is QRS peak detection. This is demonstrated with an
online algorithm, which is shown to be space as well as time efficient.
Experimental results on the MIT-BIH Arrhythmia database show that
this novel approach is promising. Further uses for this approach are
discussed, such as taking advantage of its small memory requirements
and interpreting large amounts of pre-recorded ECG data.
Abstract: Human Resource (HR) applications can be used to
provide fair and consistent decisions, and to improve the
effectiveness of decision making processes. Besides that, among
the challenge for HR professionals is to manage organization
talents, especially to ensure the right person for the right job at the
right time. For that reason, in this article, we attempt to describe
the potential to implement one of the talent management tasks i.e.
identifying existing talent by predicting their performance as one of
HR application for talent management. This study suggests the
potential HR system architecture for talent forecasting by using
past experience knowledge known as Knowledge Discovery in
Database (KDD) or Data Mining. This article consists of three
main parts; the first part deals with the overview of HR
applications, the prediction techniques and application, the general
view of Data mining and the basic concept of talent management
in HRM. The second part is to understand the use of Data Mining
technique in order to solve one of the talent management tasks, and
the third part is to propose the potential HR system architecture for
talent forecasting.
Abstract: In this paper we propose a novel approach for ascertaining human identity based on fusion of profile face and gait biometric cues The identification approach based on feature learning in PCA-LDA subspace, and classification using multivariate Bayesian classifiers allows significant improvement in recognition accuracy for low resolution surveillance video scenarios. The experimental evaluation of the proposed identification scheme on a publicly available database [2] showed that the fusion of face and gait cues in joint PCA-LDA space turns out to be a powerful method for capturing the inherent multimodality in walking gait patterns, and at the same time discriminating the person identity..
Abstract: The recognition of human faces, especially those with
different orientations is a challenging and important problem in image
analysis and classification. This paper proposes an effective scheme
for rotation invariant face recognition using Log-Polar Transform and
Discrete Cosine Transform combined features. The rotation invariant
feature extraction for a given face image involves applying the logpolar
transform to eliminate the rotation effect and to produce a row
shifted log-polar image. The discrete cosine transform is then applied
to eliminate the row shift effect and to generate the low-dimensional
feature vector. A PSO-based feature selection algorithm is utilized to
search the feature vector space for the optimal feature subset.
Evolution is driven by a fitness function defined in terms of
maximizing the between-class separation (scatter index).
Experimental results, based on the ORL face database using testing
data sets for images with different orientations; show that the
proposed system outperforms other face recognition methods. The
overall recognition rate for the rotated test images being 97%,
demonstrating that the extracted feature vector is an effective rotation
invariant feature set with minimal set of selected features.
Abstract: In illumination variant face recognition, existing
methods extracting face albedo as light normalized image may lead to
loss of extensive facial details, with light template discarded. To
improve that, a novel approach for realistic facial texture
reconstruction by combining original image and albedo image is
proposed. First, light subspaces of different identities are established
from the given reference face images; then by projecting the original
and albedo image into each light subspace respectively, texture
reference images with corresponding lighting are reconstructed and
two texture subspaces are formed. According to the projections in
texture subspaces, facial texture with normal light can be synthesized.
Due to the combination of original image, facial details can be
preserved with face albedo. In addition, image partition is applied to
improve the synthesization performance. Experiments on Yale B and
CMUPIE databases demonstrate that this algorithm outperforms the
others both in image representation and in face recognition.
Abstract: The problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items between transactions computed instead of creating itemset and computing their frequency. With applying real life transactions and some consumption is taken from real life data, the significant efficiency acquire from databases in generation association rules mining.
Abstract: In this paper a new approach to face recognition is presented that achieves double dimension reduction making the system computationally efficient with better recognition results. In pattern recognition techniques, discriminative information of image increases with increase in resolution to a certain extent, consequently face recognition results improve with increase in face image resolution and levels off when arriving at a certain resolution level. In the proposed model of face recognition, first image decimation algorithm is applied on face image for dimension reduction to a certain resolution level which provides best recognition results. Due to better computational speed and feature extraction potential of Discrete Cosine Transform (DCT) it is applied on face image. A subset of coefficients of DCT from low to mid frequencies that represent the face adequately and provides best recognition results is retained. A trade of between decimation factor, number of DCT coefficients retained and recognition rate with minimum computation is obtained. Preprocessing of the image is carried out to increase its robustness against variations in poses and illumination level. This new model has been tested on different databases which include ORL database, Yale database and a color database. The proposed technique has performed much better compared to other techniques. The significance of the model is two fold: (1) dimension reduction up to an effective and suitable face image resolution (2) appropriate DCT coefficients are retained to achieve best recognition results with varying image poses, intensity and illumination level.
Abstract: The problem of frequent pattern discovery is defined
as the process of searching for patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns
has become an important data mining task because it reveals associations, correlations, and many other interesting relationships
hidden in a database. Most of the proposed frequent pattern mining
algorithms have been implemented with imperative programming
languages. Such paradigm is inefficient when set of patterns is large
and the frequent pattern is long. We suggest a high-level declarative
style of programming apply to the problem of frequent pattern
discovery. We consider two languages: Haskell and Prolog. Our
intuitive idea is that the problem of finding frequent patterns should
be efficiently and concisely implemented via a declarative paradigm
since pattern matching is a fundamental feature supported by most
functional languages and Prolog. Our frequent pattern mining
implementation using the Haskell and Prolog languages confirms our
hypothesis about conciseness of the program. The comparative
performance studies on line-of-code, speed and memory usage of
declarative versus imperative programming have been reported in the
paper.
Abstract: In July 1, 2007, Taiwan Stock Exchange (TWSE) on
market observation post system (MOPS) adds a new "Financial
reference database" for investors to do investment reference. This
database as a warning to public offering companies listed on the
public financial information and it original within eight targets. In
this paper, this database provided by the indicators for the application
of company financial crisis early warning model verify that the
database provided by the indicator forecast for the financial crisis,
whether or not companies have a high accuracy rate as opposed to
domestic and foreign scholars have positive results. There is use of
Logistic Regression Model application of the financial early warning
model, in which no joined back-conditions is the first model, joined it
in is the second model, has been taken occurred in the financial crisis
of companies to research samples and then business took place
before the financial crisis point with T-1 and T-2 sample data to do
positive analysis. The results show that this database provided the
debt ratio and net per share for the best forecast variables.
Abstract: Database management systems that integrate user preferences promise better solution for personalization, greater flexibility and higher quality of query responses. This paper presents a tentative work that studies and investigates approaches to express user preferences in queries. We sketch an extend capabilities of SQLf language that uses the fuzzy set theory in order to define the user preferences. For that, two essential points are considered: the first concerns the expression of user preferences in SQLf by so-called fuzzy commensurable predicates set. The second concerns the bipolar way in which these user preferences are expressed on mandatory and/or optional preferences.
Abstract: In face recognition, feature extraction techniques
attempts to search for appropriate representation of the data. However,
when the feature dimension is larger than the samples size, it brings
performance degradation. Hence, we propose a method called
Normalization Discriminant Independent Component Analysis
(NDICA). The input data will be regularized to obtain the most
reliable features from the data and processed using Independent
Component Analysis (ICA). The proposed method is evaluated on
three face databases, Olivetti Research Ltd (ORL), Face Recognition
Technology (FERET) and Face Recognition Grand Challenge
(FRGC). NDICA showed it effectiveness compared with other
unsupervised and supervised techniques.
Abstract: The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.