Abstract: This paper presents the methodology from machine
learning approaches for short-term rain forecasting system. Decision
Tree, Artificial Neural Network (ANN), and Support Vector Machine
(SVM) were applied to develop classification and prediction models
for rainfall forecasts. The goals of this presentation are to
demonstrate (1) how feature selection can be used to identify the
relationships between rainfall occurrences and other weather
conditions and (2) what models can be developed and deployed for
predicting the accurate rainfall estimates to support the decisions to
launch the cloud seeding operations in the northeastern part of
Thailand. Datasets collected during 2004-2006 from the
Chalermprakiat Royal Rain Making Research Center at Hua Hin,
Prachuap Khiri khan, the Chalermprakiat Royal Rain Making
Research Center at Pimai, Nakhon Ratchasima and Thai
Meteorological Department (TMD). A total of 179 records with 57
features was merged and matched by unique date. There are three
main parts in this work. Firstly, a decision tree induction algorithm
(C4.5) was used to classify the rain status into either rain or no-rain.
The overall accuracy of classification tree achieves 94.41% with the
five-fold cross validation. The C4.5 algorithm was also used to
classify the rain amount into three classes as no-rain (0-0.1 mm.),
few-rain (0.1- 10 mm.), and moderate-rain (>10 mm.) and the overall
accuracy of classification tree achieves 62.57%. Secondly, an ANN
was applied to predict the rainfall amount and the root mean square
error (RMSE) were used to measure the training and testing errors of
the ANN. It is found that the ANN yields a lower RMSE at 0.171 for
daily rainfall estimates, when compared to next-day and next-2-day
estimation. Thirdly, the ANN and SVM techniques were also used to
classify the rain amount into three classes as no-rain, few-rain, and
moderate-rain as above. The results achieved in 68.15% and 69.10%
of overall accuracy of same-day prediction for the ANN and SVM
models, respectively. The obtained results illustrated the comparison
of the predictive power of different methods for rainfall estimation.
Abstract: Renewable and non-renewable resource constraints have been vast studied in theoretical fields of project scheduling problems. However, although cumulative resources are widespread in practical cases, the literature on project scheduling problems subject to these resources is scant. So in order to study this type of resources more, in this paper we use the framework of a resource constrained project scheduling problem (RCPSP) with finish-start precedence relations between activities and subject to the cumulative resources in addition to the renewable resources. We develop a branch and bound algorithm for this problem customizing precedence tree algorithm of RCPSP. We perform extensive experimental analysis on the algorithm to check its effectiveness and performance for solving different instances of the problem in question.
Abstract: The purpose of determining impact significance is to
place value on impacts. Environmental impact assessment review is a
process that judges whether impact significance is acceptable or not in
accordance with the scientific facts regarding environmental,
ecological and socio-economical impacts described in environmental
impact statements (EIS) or environmental impact assessment reports
(EIAR). The first aim of this paper is to summarize the criteria of
significance evaluation from the past review results and accordingly
utilize fuzzy logic to incorporate these criteria into scientific facts. The
second aim is to employ data mining technique to construct an EIS or
EIAR prediction model for reviewing results which can assist
developers to prepare and revise better environmental management
plans in advance. The validity of the previous prediction model
proposed by authors in 2009 is 92.7%. The enhanced validity in this
study can attain 100.0%.
Abstract: Many supervised induction algorithms require discrete
data, even while real data often comes in a discrete
and continuous formats. Quality discretization of continuous
attributes is an important problem that has effects on speed,
accuracy and understandability of the induction models. Usually,
discretization and other types of statistical processes are applied
to subsets of the population as the entire population is practically
inaccessible. For this reason we argue that the discretization
performed on a sample of the population is only an estimate of
the entire population. Most of the existing discretization methods,
partition the attribute range into two or several intervals using
a single or a set of cut points. In this paper, we introduce a
technique by using resampling (such as bootstrap) to generate
a set of candidate discretization points and thus, improving the
discretization quality by providing a better estimation towards
the entire population. Thus, the goal of this paper is to observe
whether the resampling technique can lead to better discretization
points, which opens up a new paradigm to construction of
soft decision trees.
Abstract: In this paper a combined feature selection method is
proposed which takes advantages of sample domain filtering,
resampling and feature subset evaluation methods to reduce
dimensions of huge datasets and select reliable features. This method
utilizes both feature space and sample domain to improve the process
of feature selection and uses a combination of Chi squared with
Consistency attribute evaluation methods to seek reliable features.
This method consists of two phases. The first phase filters and
resamples the sample domain and the second phase adopts a hybrid
procedure to find the optimal feature space by applying Chi squared,
Consistency subset evaluation methods and genetic search.
Experiments on various sized datasets from UCI Repository of
Machine Learning databases show that the performance of five
classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First
Decision Tree and JRIP) improves simultaneously and the
classification error for these classifiers decreases considerably. The
experiments also show that this method outperforms other feature
selection methods.
Abstract: Recommender systems are usually regarded as an
important marketing tool in the e-commerce. They use important
information about users to facilitate accurate recommendation. The
information includes user context such as location, time and interest
for personalization of mobile users. We can easily collect information
about location and time because mobile devices communicate with the
base station of the service provider. However, information about user
interest can-t be easily collected because user interest can not be
captured automatically without user-s approval process. User interest
usually represented as a need. In this study, we classify needs into two
types according to prior research. This study investigates the
usefulness of data mining techniques for classifying user need type for
recommendation systems. We employ several data mining techniques
including artificial neural networks, decision trees, case-based
reasoning, and multivariate discriminant analysis. Experimental
results show that CHAID algorithm outperforms other models for
classifying user need type. This study performs McNemar test to
examine the statistical significance of the differences of classification
results. The results of McNemar test also show that CHAID performs
better than the other models with statistical significance.
Abstract: This work proposes an accurate crosstalk noise estimation method in the presence of multiple RLC lines for the use in design automation tools. This method correctly models the loading effects of non switching aggressors and aggressor tree branches using resistive shielding effect and realistic exponential input waveforms. Noise peak and width expressions have been derived. The results obtained are at good agreement with SPICE results. Results show that average error for noise peak is 4.7% and for the width is 6.15% while allowing a very fast analysis.
Abstract: Existing work in temporal logic on representing the
execution of infinitely many transactions, uses linear-time temporal
logic (LTL) and only models two-step transactions. In this paper,
we use the comparatively efficient branching-time computational tree
logic CTL and extend the transaction model to a class of multistep
transactions, by introducing distinguished propositional variables
to represent the read and write steps of n multi-step transactions
accessing m data items infinitely many times. We prove that the
well known correspondence between acyclicity of conflict graphs
and serializability for finite schedules, extends to infinite schedules.
Furthermore, in the case of transactions accessing the same set of
data items in (possibly) different orders, serializability corresponds
to the absence of cycles of length two. This result is used to give an
efficient encoding of the serializability condition into CTL.
Abstract: Computing and maintaining network structures for efficient
data aggregation incurs high overhead for dynamic events
where the set of nodes sensing an event changes with time. Moreover,
structured approaches are sensitive to the waiting time that is used
by nodes to wait for packets from their children before forwarding
the packet to the sink. An optimal routing and data aggregation
scheme for wireless sensor networks is proposed in this paper. We
propose Tree on DAG (ToD), a semistructured approach that uses
Dynamic Forwarding on an implicitly constructed structure composed
of multiple shortest path trees to support network scalability. The key
principle behind ToD is that adjacent nodes in a graph will have
low stretch in one of these trees in ToD, thus resulting in early
aggregation of packets. Based on simulations on a 2,000-node Mica2-
based network, we conclude that efficient aggregation in large-scale
networks can be achieved by our semistructured approach.
Abstract: Phylogenetic tree is a graphical representation of the
evolutionary relationship among three or more genes or organisms.
These trees show relatedness of data sets, species or genes
divergence time and nature of their common ancestors. Quality of a
phylogenetic tree requires parsimony criterion. Various approaches
have been proposed for constructing most parsimonious trees. This
paper is concerned about calculating and optimizing the changes of
state that are needed called Small Parsimony Algorithms. This paper
has proposed enhanced small parsimony algorithm to give better
score based on number of evolutionary changes needed to produce
the observed sequence changes tree and also give the ancestor of the
given input.
Abstract: In this paper we have proposed a novel dynamic least cost multicast routing protocol using hybrid genetic algorithm for IP networks. Our protocol finds the multicast tree with minimum cost subject to delay, degree, and bandwidth constraints. The proposed protocol has the following features: i. Heuristic local search function has been devised and embedded with normal genetic operation to increase the speed and to get the optimized tree, ii. It is efficient to handle the dynamic situation arises due to either change in the multicast group membership or node / link failure, iii. Two different crossover and mutation probabilities have been used for maintaining the diversity of solution and quick convergence. The simulation results have shown that our proposed protocol generates dynamic multicast tree with lower cost. Results have also shown that the proposed algorithm has better convergence rate, better dynamic request success rate and less execution time than other existing algorithms. Effects of degree and delay constraints have also been analyzed for the multicast tree interns of search success rate.
Abstract: This paper investigates the issue of building decision
trees from data with imprecise class values where imprecision is
encoded in the form of possibility distributions. The Information
Affinity similarity measure is introduced into the well-known gain
ratio criterion in order to assess the homogeneity of a set of
possibility distributions representing instances-s classes belonging to
a given training partition. For the experimental study, we proposed an
information affinity based performance criterion which we have used
in order to show the performance of the approach on well-known
benchmarks.
Abstract: In this paper, we propose a Perceptually Optimized Foveation based Embedded ZeroTree Image Coder (POEFIC) that introduces a perceptual weighting to wavelet coefficients prior to control SPIHT encoding algorithm in order to reach a targeted bit rate with a perceptual quality improvement with respect to a given bit rate a fixation point which determines the region of interest ROI. The paper also, introduces a new objective quality metric based on a Psychovisual model that integrates the properties of the HVS that plays an important role in our POEFIC quality assessment. Our POEFIC coder is based on a vision model that incorporates various masking effects of human visual system HVS perception. Thus, our coder weights the wavelet coefficients based on that model and attempts to increase the perceptual quality for a given bit rate and observation distance. The perceptual weights for all wavelet subbands are computed based on 1) foveation masking to remove or reduce considerable high frequencies from peripheral regions 2) luminance and Contrast masking, 3) the contrast sensitivity function CSF to achieve the perceptual decomposition weighting. The new perceptually optimized codec has the same complexity as the original SPIHT techniques. However, the experiments results show that our coder demonstrates very good performance in terms of quality measurement.
Abstract: Network layer multicast, i.e. IP multicast, even after
many years of research, development and standardization, is not
deployed in large scale due to both technical (e.g. upgrading of
routers) and political (e.g. policy making and negotiation) issues.
Researchers looked for alternatives and proposed application/overlay
multicast where multicast functions are handled by end hosts, not
network layer routers. Member hosts wishing to receive multicast
data form a multicast delivery tree. The intermediate hosts in the tree
act as routers also, i.e. they forward data to the lower hosts in the
tree. Unlike IP multicast, where a router cannot leave the tree until all
members below it leave, in overlay multicast any member can leave
the tree at any time thus disjoining the tree and disrupting the data
dissemination. All the disrupted hosts have to rejoin the tree. This
characteristic of the overlay multicast causes multicast tree unstable,
data loss and rejoin overhead. In this paper, we propose that each node
sets its leaving time from the tree and sends join request to a number
of nodes in the tree. The nodes in the tree will reject the request if
their leaving time is earlier than the requesting node otherwise they
will accept the request. The node can join at one of the accepting
nodes. This makes the tree more stable as the nodes will join the tree
according to their leaving time, earliest leaving time node being at the
leaf of the tree. Some intermediate nodes may not follow their leaving
time and leave earlier than their leaving time thus disrupting the tree.
For this, we propose a proactive recovery mechanism so that disrupted
nodes can rejoin the tree at predetermined nodes immediately. We
have shown by simulation that there is less overhead when joining
the multicast tree and the recovery time of the disrupted nodes is
much less than the previous works. Keywords
Abstract: The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represents a tissue of a certain tumor stage and each node in the network represents a gene. From these networks we extract one tree for each gene by a local decomposition of the correlation network. The interpretation of a tree is that it represents the n-nearest neighbor genes on the n-th level of a tree, measured by the Dijkstra distance, and, hence, gives the local embedding of a gene within the correlation network. For the obtained trees we measure the pairwise similarity between trees rooted by the same gene from normal to cancerous tissues. This evaluates the modification of the tree topology due to tumor progression. Finally, we rank the obtained similarity values from all tissue comparisons and select the top ranked genes. For these genes the local neighborhood in the correlation networks changes most between normal and cancerous tissues. As a result we find that the top ranked genes are candidates suspected to be involved in tumor growth. This indicates that our method captures essential information from the underlying DNA microarray data of cervical cancer.
Abstract: Influence of octane and benzene on plant cell
ultrastructure and enzymes of basic metabolism, such as nitrogen
assimilation and energy generation have been studied. Different
plants: perennial ryegrass (Lolium perenne) and alfalfa (Medicago
sativa); crops- maize (Zea mays L.) and bean (Phaseolus vulgaris);
shrubs – privet (Ligustrum sempervirens) and trifoliate orange
(Poncirus trifoliate); trees - poplar (Populus deltoides) and white
mulberry (Morus alba L.) were exposed to hydrocarbons of different
concentrations (1, 10 and 100 mM). Destructive changes in bean and
maize leaves cells ultrastructure under the influence of benzene
vapour were revealed at the level of photosynthetic and energy
generation subcellular organells. Different deviations at the level of
subcellular organelles structure and distribution were observed in
alfalfa and ryegrass root cells under the influence of benzene and
octane, absorbed through roots. The level of destructive changes is
concentration dependent. Benzene at low 1 and 10 mM concentration
caused the increase in glutamate dehydrogenase (GDH) activity in
maize roots and leaves and in poplar and mulberry shoots, though to
higher extent in case of lower, 1mM concentration. The induction
was more intensive in plant roots. The highest tested 100mM
concentration of benzene was inhibitory to the enzyme in all plants.
Octane caused induction of GDH in all grassy plants at all tested
concentrations; however the rate of induction decreased parallel to
increase of the hydrocarbon concentration. Octane at concentration 1
mM caused induction of GDH in privet, trifoliate and white mulberry
shoots. The highest, 100mM octane was characterized by inhibitory
effect to GDH activity in all plants. Octane had inductive effect on
malate dehydrogenase in almost all plants and tested concentrations,
indicating the intensification of Trycarboxylic Acid Cycle.
The data could be suggested for elaboration of criteria for plant
selection for phytoremediation of oil hydrocarbons contaminated
soils.
Abstract: This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the training dictionary and that can be applied to out-of-vocabulary words. The proposed approach improves upon existing rule-tree-based techniques in that it makes use of graphemes, rather than letters, as elementary orthographic units. A new linear algorithm for the segmentation of a word in graphemes is introduced to enable outof- vocabulary grapheme-based phonetic transcription. Exhaustive rule trees provide a canonical representation of the pronunciation rules of a language that can be used not only to pronounce out-of-vocabulary words, but also to analyze and compare the pronunciation rules inferred from different dictionaries. The proposed approach has been implemented in C and tested on Oxford British English and Basic English. Experimental results show that grapheme-based rule trees represent phonetically sound rules and provide better performance than letter-based rule trees.
Abstract: The evaluation of non-conventional water resources
on seed germination and seedling growth performance at early
growth stages is still in progress especially in forage crops. This
study was designed to test the effect of four types of water qualities
(treated wastewater (TWW), industrial water (IW), grey water (GW),
and Distilled water (DW)) on germination and early seedling vigor of
Leucaena leucocephala. The results showed that the germination
was not significantly affected by the different water qualities. Seed
germination reached maximum after 17, 14, 14, and 21 days under
GW, IW, TWW, and DW treatments, respectively. The highest mean
of shoot length was scored under the GW treatment. And, the highest
mean of root length was scored under DW which was not significant
from GW treatment. The means of shoot fresh was the highest under
the TWW. The means of root fresh weight was not significantly
different from each other's under different treatments. The growth
performance was in progress with no mortality during 21 days of
growth. Thus, the best non-conventional water qualities alternatives
based on the cleanness, nutrients, and toxicity are the GW, TWW and
IW, respectively.
Abstract: The article deals with development, design and
implementation of a mathematical model of the human respiratory
system. The model is designed in order to simulate distribution of
important intrapulmonary parameters along the bronchial tree such as
pressure amplitude, tidal volume and effect of regional mechanical
lung properties upon the efficiency of various ventilatory techniques.
Therefore exact agreement of the model structure with the lung
anatomical structure is required. The model is based on the lung
morphology and electro-acoustic analogy is used to design the
model.
Abstract: A spanning tree of a connected graph is a tree which
consists the set of vertices and some or perhaps all of the edges from
the connected graph. In this paper, a model for spanning tree
transformation of connected graphs into single-row networks, namely
Spanning Tree of Connected Graph Modeling (STCGM) will be
introduced. Path-Growing Tree-Forming algorithm applied with
Vertex-Prioritized is contained in the model to produce the spanning
tree from the connected graph. Paths are produced by Path-Growing
and they are combined into a spanning tree by Tree-Forming. The
spanning tree that is produced from the connected graph is then
transformed into single-row network using Tree Sequence Modeling
(TSM). Finally, the single-row routing problem is solved using a
method called Enhanced Simulated Annealing for Single-Row
Routing (ESSR).