Abstract: In this work, we use machine learning and data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. Our work applies modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.
Abstract: Essential genes play an important role in the survival of an organism. It has been shown that cancer-associated essential genes are genes necessary for cancer cell proliferation, where these genes are potential therapeutic targets. Also, it was demonstrated that mutations of the cancer-associated essential genes give rise to the resistance of immunotherapy for patients with tumors. In the present study, we focus on studying the biological effects of the essential genes from a network perspective. We hypothesize that one can analyze a biological molecular network by decomposing it into both three-node and four-node digraphs (subgraphs). These network subgraphs encode the regulatory interaction information among the network’s genetic elements. In this study, the frequency of occurrence of the subgraph-associated essential genes in a molecular network was quantified by using the statistical parameter, odds ratio. Biological effects of subgraph-associated essential genes are discussed. In summary, the subgraph approach provides a systematic method for analyzing molecular networks and it can capture useful biological information for biomedical research.
Abstract: This paper presents general results on the Java source
code snippet detection problem. We propose the tool which uses
graph and subgraph isomorphism detection. A number of solutions
for all of these tasks have been proposed in the literature. However,
although that all these solutions are really fast, they compare just the
constant static trees. Our solution offers to enter an input sample
dynamically with the Scripthon language while preserving an
acceptable speed. We used several optimizations to achieve very low
number of comparisons during the matching algorithm.
Abstract: Let maxζG(m) denote the maximum number of edges in a subgraph of graph G induced by m nodes. The n-dimensional augmented cube, denoted as AQn, a variation of the hypercube, possesses some properties superior to those of the hypercube. We study the cases when G is the augmented cube AQn.
Abstract: The nullity η(G) of a graph is the occurrence of zero as an eigenvalue in its spectra. A zero-sum weighting of a graph G is real valued function, say f from vertices of G to the set of real numbers, provided that for each vertex of G the summation of the weights f(w) over all neighborhood w of v is zero for each v in G.A high zero-sum weighting of G is one that uses maximum number of non-zero independent variables. If G is graph with an end vertex, and if H is an induced subgraph of G obtained by deleting this vertex together with the vertex adjacent to it, then, η(G)= η(H). In this paper, a high zero-sum weighting technique and the endvertex procedure are applied to evaluate the nullity of t-tupple and generalized t-tupple graphs are derived and determined for some special types of graphs,
Also, we introduce and prove some important results about the t-tupple coalescence, Cartesian and Kronecker products of nut graphs.
Abstract: Given bipartite graphs H1 and H2, the bipartite Ramsey number b(H1;H2) is the smallest integer b such that any subgraph G of the complete bipartite graph Kb,b, either G contains a copy of H1 or its complement relative to Kb,b contains a copy of H2. It is known that b(K2,2;K2,2) = 5, b(K2,3;K2,3) = 9, b(K2,4;K2,4) = 14 and b(K3,3;K3,3) = 17. In this paper we study the case that both H1 and H2 are even cycles, prove that b(C2m;C2n) ≥ m + n - 1 for m = n, and b(C2m;C6) = m + 2 for m ≥ 4.
Abstract: In this paper a deterministic polynomial-time
algorithm is presented for the Clique problem. The case is considered
as the problem of omitting the minimum number of vertices from the
input graph so that none of the zeroes on the graph-s adjacency
matrix (except the main diagonal entries) would remain on the
adjacency matrix of the resulting subgraph. The existence of a
deterministic polynomial-time algorithm for the Clique problem, as
an NP-complete problem will prove the equality of P and NP
complexity classes.
Abstract: The classification of the protein structure is commonly
not performed for the whole protein but for structural domains, i.e.,
compact functional units preserved during evolution. Hence, a first
step to a protein structure classification is the separation of the
protein into its domains. We approach the problem of protein domain
identification by proposing a novel graph theoretical algorithm. We
represent the protein structure as an undirected, unweighted and
unlabeled graph which nodes correspond the secondary structure
elements of the protein. This graph is call the protein graph. The
domains are then identified as partitions of the graph corresponding
to vertices sets obtained by the maximization of an objective function,
which mutually maximizes the cycle distributions found in the
partitions of the graph. Our algorithm does not utilize any other kind
of information besides the cycle-distribution to find the partitions. If
a partition is found, the algorithm is iteratively applied to each of
the resulting subgraphs. As stop criterion, we calculate numerically
a significance level which indicates the stability of the predicted
partition against a random rewiring of the protein graph. Hence,
our algorithm terminates automatically its iterative application. We
present results for one and two domain proteins and compare our
results with the manually assigned domains by the SCOP database
and differences are discussed.
Abstract: New graph similarity methods have been proposed in this work with the aim to refining the chemical information extracted from molecules matching. For this purpose, data fusion of the isomorphic and nonisomorphic subgraphs into a new similarity measure, the Approximate Similarity, was carried out by several approaches. The application of the proposed method to the development of quantitative structure-activity relationships (QSAR) has provided reliable tools for predicting several pharmacological parameters: binding of steroids to the globulin-corticosteroid receptor, the activity of benzodiazepine receptor compounds, and the blood brain barrier permeability. Acceptable results were obtained for the models presented here.
Abstract: Graph has become increasingly important in modeling
complicated structures and schemaless data such as proteins, chemical
compounds, and XML documents. Given a graph query, it is desirable
to retrieve graphs quickly from a large database via graph-based
indices. Different from the existing methods, our approach, called
VFM (Vertex to Frequent Feature Mapping), makes use of vertices
and decision features as the basic indexing feature. VFM constructs
two mappings between vertices and frequent features to answer graph
queries. The VFM approach not only provides an elegant solution to
the graph indexing problem, but also demonstrates how database
indexing and query processing can benefit from data mining,
especially frequent pattern mining. The results show that the proposed
method not only avoids the enumeration method of getting subgraphs
of query graph, but also effectively reduces the subgraph isomorphism
tests between the query graph and graphs in candidate answer set in
verification stage.
Abstract: The similarity comparison of RNA secondary
structures is important in studying the functions of RNAs. In recent
years, most existing tools represent the secondary structures by
tree-based presentation and calculate the similarity by tree alignment
distance. Different to previous approaches, we propose a new method
based on maximum clique detection algorithm to extract the maximum
common structural elements in compared RNA secondary structures.
A new graph-based similarity measurement and maximum common
subgraph detection procedures for comparing purely RNA secondary
structures is introduced. Given two RNA secondary structures, the
proposed algorithm consists of a process to determine the score of the
structural similarity, followed by comparing vertices labelling, the
labelled edges and the exact degree of each vertex. The proposed
algorithm also consists of a process to extract the common structural
elements between compared secondary structures based on a proposed
maximum clique detection of the problem. This graph-based model
also can work with NC-IUB code to perform the pattern-based
searching. Therefore, it can be used to identify functional RNA motifs
from database or to extract common substructures between complex
RNA secondary structures. We have proved the performance of this
proposed algorithm by experimental results. It provides a new idea of
comparing RNA secondary structures. This tool is helpful to those
who are interested in structural bioinformatics.
Abstract: In this paper, we investigate the appearance of the giant component in random subgraphs G(p) of a given large finite graph family Gn = (Vn, En) in which each edge is present independently with probability p. We show that if the graph Gn satisfies a weak isoperimetric inequality and has bounded degree, then the probability p under which G(p) has a giant component of linear order with some constant probability is bounded away from zero and one. In addition, we prove the probability of abnormally large order of the giant component decays exponentially. When a contact graph is modeled as Gn, our result is of special interest in the study of the spread of infectious diseases or the identification of community in various social networks.
Abstract: For positive integer s and t, the Ramsey number R(s, t)
is the least positive integer n such that for every graph G of order n, either G contains Ks as a subgraph or G contains Kt as a subgraph.
We construct the circulant graphs and use them to obtain lower bounds of some small Ramsey numbers.
Abstract: In many applications, data is in graph structure, which
can be naturally represented as graph-structured XML. Existing
queries defined on tree-structured and graph-structured XML data
mainly focus on subgraph matching, which can not cover all the
requirements of querying on graph. In this paper, a new kind of
queries, topological query on graph-structured XML is presented.
This kind of queries consider not only the structure of subgraph but
also the topological relationship between subgraphs. With existing
subgraph query processing algorithms, efficient algorithms for topological
query processing are designed. Experimental results show the
efficiency of implementation algorithms.
Abstract: Abstract–Let k ≥ 3 be an integer, and let G be a graph of order n with n ≥ 9k +3- 42(k - 1)2 + 2. Then a spanning subgraph F of G is called a k-factor if dF (x) = k for each x ∈ V (G). A fractional k-factor is a way of assigning weights to the edges of a graph G (with all weights between 0 and 1) such that for each vertex the sum of the weights of the edges incident with that vertex is k. A graph G is a fractional k-deleted graph if there exists a fractional k-factor after deleting any edge of G. In this paper, it is proved that G is a fractional k-deleted graph if G satisfies δ(G) ≥ k + 1 and |NG(x) ∪ NG(y)| ≥ 1 2 (n + k - 2) for each pair of nonadjacent vertices x, y of G.
Abstract: Let G be a graph of order n, and let a, b and m be positive integers with 1 ≤ a n + a + b − 2 √bn+ 1, then for any subgraph H of G with m edges, G has an [a, b]-factor F such that E(H)∩ E(F) = ∅. This result is an extension of thatof Egawa [2].
Abstract: Let G be a graph of order n, and let k 2 and m 0 be two integers. Let h : E(G) [0, 1] be a function. If e∋x h(e) = k holds for each x V (G), then we call G[Fh] a fractional k-factor of G with indicator function h where Fh = {e E(G) : h(e) > 0}. A graph G is called a fractional (k,m)-deleted graph if there exists a fractional k-factor G[Fh] of G with indicator function h such that h(e) = 0 for any e E(H), where H is any subgraph of G with m edges. In this paper, it is proved that G is a fractional (k,m)-deleted graph if (G) k + m + m k+1 , n 4k2 + 2k − 6 + (4k 2 +6k−2)m−2 k−1 and max{dG(x), dG(y)} n 2 for any vertices x and y of G with dG(x, y) = 2. Furthermore, it is shown that the result in this paper is best possible in some sense.
Abstract: A decomposition of a graph G is a collection ψ of subgraphs H1,H2, . . . , Hr of G such that every edge of G belongs to exactly one Hi. If each Hi is either an induced path or an induced cycle in G, then ψ is called an induced path decomposition of G. The minimum cardinality of an induced path decomposition of G is called the induced path decomposition number of G and is denoted by πi(G). In this paper we initiate a study of this parameter.
Abstract: Graph decompositions are vital in the study of combinatorial design theory. Given two graphs G and H, an H-decomposition of G is a partition of the edge set of G into disjoint isomorphic copies of H. An n-sun is a cycle Cn with an edge terminating in a vertex of degree one attached to each vertex. In this paper we have proved that the complete graph of order 2n, K2n can be decomposed into n-2 n-suns, a Hamilton cycle and a perfect matching, when n is even and for odd case, the decomposition is n-1 n-suns and a perfect matching. For an odd order complete graph K2n+1, delete the star subgraph K1, 2n and the resultant graph K2n is decomposed as in the case of even order. The method of building n-suns uses Walecki's construction for the Hamilton decomposition of complete graphs. A spanning tree decomposition of even order complete graphs is also discussed using the labeling scheme of n-sun decomposition. A complete bipartite graph Kn, n can be decomposed into n/2 n-suns when n/2 is even. When n/2 is odd, Kn, n can be decomposed into (n-2)/2 n-suns and a Hamilton cycle.
Abstract: This paper presents a unified approach based graph
theory and system theory postulates for the modeling and analysis
of Simple open cycle Gas turbine system. In the present paper, the
simple open cycle gas turbine system has been modeled up to its subsystem
level and system variables have been identified to develop the
process subgraphs. The theorems and algorithms of the graph theory
have been used to represent behavioural properties of the system like
rate of heat and work transfers rates, pressure drops and temperature
drops in the involved processes of the system. The processes have
been represented as edges of the process subgraphs and their limits
as the vertices of the process subgraphs. The system across variables
and through variables has been used to develop terminal equations of
the process subgraphs of the system. The set of equations developed
for vertices and edges of network graph are used to solve the system
for its process variables.