Abstract: An algebraic framework for processing graph signals
axiomatically designates the graph adjacency matrix as the shift
operator. In this setup, we often encounter a problem wherein we
know the filtered output and the filter coefficients, and need to
find out the input graph signal. Solution to this problem using
direct approach requires O(N3) operations, where N is the number
of vertices in graph. In this paper, we adapt the spectral graph
partitioning method for partitioning of graphs and use it to reduce
the computational cost of the filtering problem. We use the example
of denoising of the temperature data to illustrate the efficacy of the
approach.
Abstract: The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.
Abstract: Graph partitioning is a NP-hard problem with multiple
conflicting objectives. The graph partitioning should minimize the
inter-partition relationship while maximizing the intra-partition
relationship. Furthermore, the partition load should be evenly
distributed over the respective partitions. Therefore this is a multiobjective
optimization problem (MOO). One of the approaches to
MOO is Pareto optimization which has been used in this paper. The
proposed methods of this paper used to improve the performance are
injecting best solutions of previous runs into the first generation of
next runs and also storing the non-dominated set of previous
generations to combine with later generation's non-dominated set.
These improvements prevent the GA from getting stuck in the local
optima and increase the probability of finding more optimal
solutions. Finally, a simulation research is carried out to investigate
the effectiveness of the proposed algorithm. The simulation results
confirm the effectiveness of the proposed method.
Abstract: Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm.
Abstract: The clustering ensembles combine multiple partitions
generated by different clustering algorithms into a single clustering
solution. Clustering ensembles have emerged as a prominent method
for improving robustness, stability and accuracy of unsupervised
classification solutions. So far, many contributions have been done to
find consensus clustering. One of the major problems in clustering
ensembles is the consensus function. In this paper, firstly, we
introduce clustering ensembles, representation of multiple partitions,
its challenges and present taxonomy of combination algorithms.
Secondly, we describe consensus functions in clustering ensembles
including Hypergraph partitioning, Voting approach, Mutual
information, Co-association based functions and Finite mixture
model, and next explain their advantages, disadvantages and
computational complexity. Finally, we compare the characteristics of
clustering ensembles algorithms such as computational complexity,
robustness, simplicity and accuracy on different datasets in previous
techniques.