Abstract: Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.
Abstract: A prime cordial labeling of a graph G with vertex set V is a bijection f from V to {1, 2, ..., |V |} such that each edge uv is assigned the label 1 if gcd(f(u), f(v)) = 1 and 0 if gcd(f(u), f(v)) > 1, then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we exhibit some characterization results and new constructions on prime cordial graphs.
Abstract: Phylogenies ; The evolutionary histories of groups of
species are one of the most widely used tools throughout the life
sciences, as well as objects of research with in systematic,
evolutionary biology. In every phylogenetic analysis reconstruction
produces trees. These trees represent the evolutionary histories of
many groups of organisms, bacteria due to horizontal gene transfer
and plants due to process of hybridization. The process of gene
transfer in bacteria and hybridization in plants lead to reticulate
networks, therefore, the methods of constructing trees fail in
constructing reticulate networks. In this paper a model has been
employed to reconstruct phylogenetic network in honey bee. This
network represents reticulate evolution in honey bee. The maximum
parsimony approach has been used to obtain this reticulate network.
Abstract: In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate non redundant rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact than frequent itemsets used in Apriori approach. This adaptation has been experimented on a set of datasets benchmarks.
Abstract: Independent spanning trees (ISTs) provide a number of advantages in data broadcasting. One can cite the use in fault tolerance network protocols for distributed computing and bandwidth. However, the problem of constructing multiple ISTs is considered hard for arbitrary graphs. In this paper we present an efficient algorithm to construct ISTs on hypercubes that requires minimum resources to be performed.
Abstract: Generator of hypotheses is a new method for data mining. It makes possible to classify the source data automatically and produces a particular enumeration of patterns. Pattern is an expression (in a certain language) describing facts in a subset of facts. The goal is to describe the source data via patterns and/or IF...THEN rules. Used evaluation criteria are deterministic (not probabilistic). The search results are trees - form that is easy to comprehend and interpret. Generator of hypotheses uses very effective algorithm based on the theory of monotone systems (MS) named MONSA (MONotone System Algorithm).
Abstract: Biochemical and molecular analysis of some
antioxidant enzyme genes revealed different level of gene expression
on oilseed (Brassica napus). For molecular and biochemical
analysis, leaf tissues were harvested from plants at eight different
developmental stages, from young to senescence. The levels of total
protein and chlorophyll were increased during maturity stages of
plant, while these were decreased during the last stages of plant
growth. Structural analysis (nucleotide and deduced amino acid
sequence, and phylogenic tree) of a complementary DNA revealed a
high level of similarity for a family of Catalase genes. The
expression of the gene encoded by different Catalase isoforms was
assessed during different plant growth phase. No significant
difference between samples was observed, when Catalase activity
was statistically analyzed at different developmental stages. EST
analysis exhibited different transcripts levels for a number of other
relevant antioxidant genes (different isoforms of SOD and
glutathione). The high level of transcription of these genes at
senescence stages was indicated that these genes are senescenceinduced
genes.
Abstract: Landslide susceptibility map delineates the potential
zones for landslide occurrence. Previous works have applied
multivariate methods and neural networks for mapping landslide
susceptibility. This study proposed a new approach to integrate
decision tree model and spatial cluster statistic for assessing landslide
susceptibility spatially. A total of 2057 landslide cells were digitized
for developing the landslide decision tree model. The relationships of
landslides and instability factors were explicitly represented by using
tree graphs in the model. The local Getis-Ord statistics were used to
cluster cells with high landslide probability. The analytic result from
the local Getis-Ord statistics was classed to create a map of landslide
susceptibility zones. The map was validated using new landslide data
with 482 cells. Results of validation show an accuracy rate of 86.1% in
predicting new landslide occurrence. This indicates that the proposed
approach is useful for improving landslide susceptibility mapping.
Abstract: A spatial classification technique incorporating a State of Art Feature Extraction algorithm is proposed in this paper for classifying a heterogeneous classes present in hyper spectral images. The classification accuracy can be improved if and only if both the feature extraction and classifier selection are proper. As the classes in the hyper spectral images are assumed to have different textures, textural classification is entertained. Run Length feature extraction is entailed along with the Principal Components and Independent Components. A Hyperspectral Image of Indiana Site taken by AVIRIS is inducted for the experiment. Among the original 220 bands, a subset of 120 bands is selected. Gray Level Run Length Matrix (GLRLM) is calculated for the selected forty bands. From GLRLMs the Run Length features for individual pixels are calculated. The Principle Components are calculated for other forty bands. Independent Components are calculated for next forty bands. As Principal & Independent Components have the ability to represent the textural content of pixels, they are treated as features. The summation of Run Length features, Principal Components, and Independent Components forms the Combined Features which are used for classification. SVM with Binary Hierarchical Tree is used to classify the hyper spectral image. Results are validated with ground truth and accuracies are calculated.
Abstract: Natural outdoor scene classification is active and
promising research area around the globe. In this study, the
classification is carried out in two phases. In the first phase, the
features are extracted from the images by wavelet decomposition
method and stored in a database as feature vectors. In the second
phase, the neural classifiers such as back-propagation neural network
(BPNN) and resilient back-propagation neural network (RPNN) are
employed for the classification of scenes. Four hundred color images
are considered from MIT database of two classes as forest and street.
A comparative study has been carried out on the performance of the
two neural classifiers BPNN and RPNN on the increasing number of
test samples. RPNN showed better classification results compared to
BPNN on the large test samples.
Abstract: Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.
Abstract: The acid rain causes change in pH level of soil it is
directly influence on root and leaf growth. Yield of the crop was
reduced if acidity of soil is more. Acid rain seeps into the earth and
poisons plants and trees by dissolving toxic substances in the soil,
such as aluminum, which get absorbed by the roots. In present
investigation, effect of acid rain on crop Vigna radiata was studied.
The effect of acid rain on change in soil fertility was detected in
which pH of control sample was 6.5 and pH of 1% H2SO4 and 1%
HNO3 were 3.5. Nitrogen nitrate in soil was high in 1% HNO3 treated
soil & Control sample. Ammonium nitrogen in soil was low in 1%
HNO3 & H2SO4 treated soil. Ammonium nitrogen was medium in
control and other samples. The effect of acid rain on seed
germination on 3rd day of germination control sample growth was
6.1cm with plumule 0.001% HNO3 & 0.001% H2SO4 was 5.5cm
with plumule and 8cm with plumule. On 10th day fungal growth was
observed in 1% and 0.1% H2SO4 concentrations when all plants were
dead. The effect of acid rain on crop productivity was investigated on
3rd day roots were developed in plants. On 12th day Vigna radiata
showed more growth in 0.1% HNO3 and 0.1% H2SO4 treated plants
as compare to control plants. On 20th day development of
discoloration of plant pigments were observed on acid treated plants
leaves. On 34th day Vigna radiata showed flower in 0.1% HNO3,
0.01% HNO3 and 0.01% H2SO4treated plants and no flowers were
observed on control plants. On 42th day 0.1% HNO3, 0.01% HNO
and 0.01% H2SO4 treated Vigna radiata variety and control plants
were showed seeds on plants. In Vigna radiate variety 0.1%, 0.01%
HNO3, 0.01% H2SO4treated plants were dead on 46th day and fungal
growth was observed. The toxicological study was carried out on
Vigna radiata plants exposed to 1% HNO3 cells were damaged more
than 1% H2SO4. Leaf sections exposed to 0.001% HNO3 & H2SO4
showed less damaged of cells and pigmentation observed in entire
slide when compare with control plant.
Abstract: This paper proposes to use ETM+ multispectral data
and panchromatic band as well as texture features derived from the
panchromatic band for land cover classification. Four texture features
including one 'internal texture' and three GLCM based textures
namely correlation, entropy, and inverse different moment were used
in combination with ETM+ multispectral data. Two data sets
involving combination of multispectral, panchromatic band and its
texture were used and results were compared with those obtained by
using multispectral data alone. A decision tree classifier with and
without boosting were used to classify different datasets. Results
from this study suggest that the dataset consisting of panchromatic
band, four of its texture features and multispectral data was able to
increase the classification accuracy by about 2%. In comparison, a
boosted decision tree was able to increase the classification accuracy
by about 3% with the same dataset.
Abstract: In this study, a high accuracy protein-protein interaction
prediction method is developed. The importance of the proposed
method is that it only uses sequence information of proteins while
predicting interaction. The method extracts phylogenetic profiles of
proteins by using their sequence information. Combining the phylogenetic
profiles of two proteins by checking existence of homologs
in different species and fitting this combined profile into a statistical
model, it is possible to make predictions about the interaction status
of two proteins.
For this purpose, we apply a collection of pattern recognition
techniques on the dataset of combined phylogenetic profiles of protein
pairs. Support Vector Machines, Feature Extraction using ReliefF,
Naive Bayes Classification, K-Nearest Neighborhood Classification,
Decision Trees, and Random Forest Classification are the methods
we applied for finding the classification method that best predicts
the interaction status of protein pairs. Random Forest Classification
outperformed all other methods with a prediction accuracy of 76.93%
Abstract: In this paper, two versions of an iterative loopless
algorithm for the classical towers of Hanoi problem with O(1) storage complexity and O(2n) time complexity are presented. Based
on this algorithm the number of different moves in each of pegs with its direction is formulated.
Abstract: Generalized Center String (GCS) problem are
generalized from Common Approximate Substring problem
and Common substring problems. GCS are known to be
NP-hard allowing the problems lies in the explosion of
potential candidates. Finding longest center string without
concerning the sequence that may not contain any motifs is
not known in advance in any particular biological gene
process. GCS solved by frequent pattern-mining techniques
and known to be fixed parameter tractable based on the
fixed input sequence length and symbol set size. Efficient
method known as Bpriori algorithms can solve GCS with
reasonable time/space complexities. Bpriori 2 and Bpriori
3-2 algorithm are been proposed of any length and any
positions of all their instances in input sequences. In this
paper, we reduced the time/space complexity of Bpriori
algorithm by Constrained Based Frequent Pattern mining
(CBFP) technique which integrates the idea of Constraint
Based Mining and FP-tree mining. CBFP mining technique
solves the GCS problem works for all center string of any
length, but also for the positions of all their mutated copies
of input sequence. CBFP mining technique construct TRIE
like with FP tree to represent the mutated copies of center
string of any length, along with constraints to restraint
growth of the consensus tree. The complexity analysis for
Constrained Based FP mining technique and Bpriori
algorithm is done based on the worst case and average case
approach. Algorithm's correctness compared with the
Bpriori algorithm using artificial data is shown.
Abstract: Segmentation of a color image composed of different
kinds of regions can be a hard problem, namely to compute for an
exact texture fields. The decision of the optimum number of
segmentation areas in an image when it contains similar and/or un
stationary texture fields. A novel neighborhood-based segmentation
approach is proposed. A genetic algorithm is used in the proposed
segment-pass optimization process. In this pass, an energy function,
which is defined based on Markov Random Fields, is minimized. In
this paper we use an adaptive threshold estimation method for image
thresholding in the wavelet domain based on the generalized
Gaussian distribution (GGD) modeling of sub band coefficients. This
method called Normal Shrink is computationally more efficient and
adaptive because the parameters required for estimating the threshold
depend on sub band data energy that used in the pre-stage of
segmentation. A quad tree is employed to implement the multi
resolution framework, which enables the use of different strategies at
different resolution levels, and hence, the computation can be
accelerated. The experimental results using the proposed
segmentation approach are very encouraging.
Abstract: Modern spatial database management systems require a unique Spatial Access Method (SAM) in order solve complex spatial quires efficiently. In this case the spatial data structure takes a prominent place in the SAM. Inadequate data structure leads forming poor algorithmic choices and forging deficient understandings of algorithm behavior on the spatial database. A key step in developing a better semantic spatial object data structure is to quantify the performance effects of semantic and outlier detections that are not reflected in the previous tree structures (R-Tree and its variants). This paper explores a novel SSRO-Tree on SAM to the Topo-Semantic approach. The paper shows how to identify and handle the semantic spatial objects with outlier objects during page overflow/underflow, using gain/loss metrics. We introduce a new SSRO-Tree algorithm which facilitates the achievement of better performance in practice over algorithms that are superior in the R*-Tree and RO-Tree by considering selection queries.
Abstract: The paper proposes a unified model for multimedia data retrieval which includes data representatives, content representatives, index structure, and search algorithms. The multimedia data are defined as k-dimensional signals indexed in a multidimensional k-tree structure. The benefits of using the k-tree unified model were demonstrated by running the data retrieval application on a six networked nodes test bed cluster. The tests were performed with two retrieval algorithms, one that allows parallel searching using a single feature, the second that performs a weighted cascade search for multiple features querying. The experiments show a significant reduction of retrieval time while maintaining the quality of results.
Abstract: Automatic reusability appraisal could be helpful in
evaluating the quality of developed or developing reusable software
components and in identification of reusable components from
existing legacy systems; that can save cost of developing the software
from scratch. But the issue of how to identify reusable components
from existing systems has remained relatively unexplored. In this
paper, we have mentioned two-tier approach by studying the
structural attributes as well as usability or relevancy of the
component to a particular domain. Latent semantic analysis is used
for the feature vector representation of various software domains. It
exploits the fact that FeatureVector codes can be seen as documents
containing terms -the idenifiers present in the components- and so
text modeling methods that capture co-occurrence information in
low-dimensional spaces can be used. Further, we devised Neuro-
Fuzzy hybrid Inference System, which takes structural metric values
as input and calculates the reusability of the software component.
Decision tree algorithm is used to decide initial set of fuzzy rules for
the Neuro-fuzzy system. The results obtained are convincing enough
to propose the system for economical identification and retrieval of
reusable software components.