Abstract: Traditionally in sensor networks and recently in the
Internet of Things, numerous heterogeneous sensors are deployed
in distributed manner to monitor a phenomenon that often can be
model by an underlying stochastic process. The big time-series
data collected by the sensors must be analyzed to detect change
in the stochastic process as quickly as possible with tolerable
false alarm rate. However, sensors may have different accuracy
and sensitivity range, and they decay along time. As a result,
the big time-series data collected by the sensors will contain
uncertainties and sometimes they are conflicting. In this study, we
present a framework to take advantage of Evidence Theory (a.k.a.
Dempster-Shafer and Dezert-Smarandache Theories) capabilities of
representing and managing uncertainty and conflict to fast change
detection and effectively deal with complementary hypotheses.
Specifically, Kullback-Leibler divergence is used as the similarity
metric to calculate the distances between the estimated current
distribution with the pre- and post-change distributions. Then mass
functions are calculated and related combination rules are applied to
combine the mass values among all sensors. Furthermore, we applied
the method to estimate the minimum number of sensors needed to
combine, so computational efficiency could be improved. Cumulative
sum test is then applied on the ratio of pignistic probability to detect
and declare the change for decision making purpose. Simulation
results using both synthetic data and real data from experimental
setup demonstrate the effectiveness of the presented schemes.
Abstract: Development of a method to estimate gene functions is
an important task in bioinformatics. One of the approaches for the
annotation is the identification of the metabolic pathway that genes are
involved in. Since gene expression data reflect various intracellular
phenomena, those data are considered to be related with genes’
functions. However, it has been difficult to estimate the gene function
with high accuracy. It is considered that the low accuracy of the
estimation is caused by the difficulty of accurately measuring a gene
expression. Even though they are measured under the same condition,
the gene expressions will vary usually. In this study, we proposed a
feature extraction method focusing on the variability of gene
expressions to estimate the genes' metabolic pathway accurately. First,
we estimated the distribution of each gene expression from replicate
data. Next, we calculated the similarity between all gene pairs by KL
divergence, which is a method for calculating the similarity between
distributions. Finally, we utilized the similarity vectors as feature
vectors and trained the multiclass SVM for identifying the genes'
metabolic pathway. To evaluate our developed method, we applied the
method to budding yeast and trained the multiclass SVM for
identifying the seven metabolic pathways. As a result, the accuracy
that calculated by our developed method was higher than the one that
calculated from the raw gene expression data. Thus, our developed
method combined with KL divergence is useful for identifying the
genes' metabolic pathway.
Abstract: The Bayesian Optimization Algorithm (BOA) is an algorithm based on the estimation of distributions. It uses techniques from modeling data by Bayesian networks to estimating the joint distribution of promising solutions. To obtain the structure of Bayesian network, different search algorithms can be used. The key point that BOA addresses is whether the constructed Bayesian network could generate new and useful solutions (strings), which could lead the algorithm in the right direction to solve the problem. Undoubtedly, this ability is a crucial factor of the efficiency of BOA. Varied search algorithms can be used in BOA, but their performances are different. For choosing better ones, certain suitable method to present their ability difference is needed. In this paper, a greedy search algorithm and a stochastic search algorithm are used in BOA to solve certain optimization problem. A method using Kullback-Leibler (KL) Divergence to reflect their difference is described.