Cirrhosis Mortality Prediction as Classification Using Frequent Subgraph Mining

In this work, we use machine learning and data analysis techniques to predict the one-year mortality of cirrhotic patients. Data from 2,322 patients with liver cirrhosis are collected at a single medical center. Different machine learning models are applied to predict one-year mortality. A comprehensive feature space including demographic information, comorbidity, clinical procedure and laboratory tests is being analyzed. A temporal pattern mining technic called Frequent Subgraph Mining (FSM) is being used. Model for End-stage liver disease (MELD) prediction of mortality is used as a comparator. All of our models statistically significantly outperform the MELD-score model and show an average 10% improvement of the area under the curve (AUC). The FSM technic itself does not improve the model significantly, but FSM, together with a machine learning technique called an ensemble, further improves the model performance. With the abundance of data available in healthcare through electronic health records (EHR), existing predictive models can be refined to identify and treat patients at risk for higher mortality. However, due to the sparsity of the temporal information needed by FSM, the FSM model does not yield significant improvements. Our work applies modern machine learning algorithms and data analysis methods on predicting one-year mortality of cirrhotic patients and builds a model that predicts one-year mortality significantly more accurate than the MELD score. We have also tested the potential of FSM and provided a new perspective of the importance of clinical features.

Identifying Network Subgraph-Associated Essential Genes in Molecular Networks

Essential genes play an important role in the survival of an organism. It has been shown that cancer-associated essential genes are genes necessary for cancer cell proliferation, where these genes are potential therapeutic targets. Also, it was demonstrated that mutations of the cancer-associated essential genes give rise to the resistance of immunotherapy for patients with tumors. In the present study, we focus on studying the biological effects of the essential genes from a network perspective. We hypothesize that one can analyze a biological molecular network by decomposing it into both three-node and four-node digraphs (subgraphs). These network subgraphs encode the regulatory interaction information among the network’s genetic elements. In this study, the frequency of occurrence of the subgraph-associated essential genes in a molecular network was quantified by using the statistical parameter, odds ratio. Biological effects of subgraph-associated essential genes are discussed. In summary, the subgraph approach provides a systematic method for analyzing molecular networks and it can capture useful biological information for biomedical research.

Tool for Fast Detection of Java Code Snippets

This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dynamically with the Scripthon language while preserving an acceptable speed. We used several optimizations to achieve very low number of comparisons during the matching algorithm.

Maximum Induced Subgraph of an Augmented Cube

Let maxζG(m) denote the maximum number of edges in a subgraph of graph G induced by m nodes. The n-dimensional augmented cube, denoted as AQn, a variation of the hypercube, possesses some properties superior to those of the hypercube. We study the cases when G is the augmented cube AQn.

Nullity of t-Tupple Graphs

The nullity η(G) of a graph is the occurrence of zero as an eigenvalue in its spectra. A zero-sum weighting of a graph G is real valued function, say f from vertices of G to the set of real numbers, provided that for each vertex of G the summation of the weights f(w) over all neighborhood w of v is zero for each v in G.A high zero-sum weighting of G is one that uses maximum number of non-zero independent variables. If G is graph with an end vertex, and if H is an induced subgraph of G obtained by deleting this vertex together with the vertex adjacent to it, then, η(G)= η(H). In this paper, a high zero-sum weighting technique and the endvertex procedure are applied to evaluate the nullity of t-tupple and generalized t-tupple graphs are derived  and determined for some special types of graphs,  Also, we introduce and prove some important results about the t-tupple coalescence, Cartesian and Kronecker products of nut graphs.

The Bipartite Ramsey Numbers b(C2m; C2n)

Given bipartite graphs H1 and H2, the bipartite Ramsey number b(H1;H2) is the smallest integer b such that any subgraph G of the complete bipartite graph Kb,b, either G contains a copy of H1 or its complement relative to Kb,b contains a copy of H2. It is known that b(K2,2;K2,2) = 5, b(K2,3;K2,3) = 9, b(K2,4;K2,4) = 14 and b(K3,3;K3,3) = 17. In this paper we study the case that both H1 and H2 are even cycles, prove that b(C2m;C2n) ≥ m + n - 1 for m = n, and b(C2m;C6) = m + 2 for m ≥ 4.

A Deterministic Polynomial-time Algorithm for the Clique Problem and the Equality of P and NP Complexity Classes

In this paper a deterministic polynomial-time algorithm is presented for the Clique problem. The case is considered as the problem of omitting the minimum number of vertices from the input graph so that none of the zeroes on the graph-s adjacency matrix (except the main diagonal entries) would remain on the adjacency matrix of the resulting subgraph. The existence of a deterministic polynomial-time algorithm for the Clique problem, as an NP-complete problem will prove the equality of P and NP complexity classes.

Protein Graph Partitioning by Mutually Maximization of cycle-distributions

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

New Graph Similarity Measurements based on Isomorphic and Nonisomorphic Data Fusion and their Use in the Prediction of the Pharmacological Behavior of Drugs

New graph similarity methods have been proposed in this work with the aim to refining the chemical information extracted from molecules matching. For this purpose, data fusion of the isomorphic and nonisomorphic subgraphs into a new similarity measure, the Approximate Similarity, was carried out by several approaches. The application of the proposed method to the development of quantitative structure-activity relationships (QSAR) has provided reliable tools for predicting several pharmacological parameters: binding of steroids to the globulin-corticosteroid receptor, the activity of benzodiazepine receptor compounds, and the blood brain barrier permeability. Acceptable results were obtained for the models presented here.

An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features

Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.

Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach

The similarity comparison of RNA secondary structures is important in studying the functions of RNAs. In recent years, most existing tools represent the secondary structures by tree-based presentation and calculate the similarity by tree alignment distance. Different to previous approaches, we propose a new method based on maximum clique detection algorithm to extract the maximum common structural elements in compared RNA secondary structures. A new graph-based similarity measurement and maximum common subgraph detection procedures for comparing purely RNA secondary structures is introduced. Given two RNA secondary structures, the proposed algorithm consists of a process to determine the score of the structural similarity, followed by comparing vertices labelling, the labelled edges and the exact degree of each vertex. The proposed algorithm also consists of a process to extract the common structural elements between compared secondary structures based on a proposed maximum clique detection of the problem. This graph-based model also can work with NC-IUB code to perform the pattern-based searching. Therefore, it can be used to identify functional RNA motifs from database or to extract common substructures between complex RNA secondary structures. We have proved the performance of this proposed algorithm by experimental results. It provides a new idea of comparing RNA secondary structures. This tool is helpful to those who are interested in structural bioinformatics.

The Giant Component in a Random Subgraph of a Weak Expander

In this paper, we investigate the appearance of the giant component in random subgraphs G(p) of a given large finite graph family Gn = (Vn, En) in which each edge is present independently with probability p. We show that if the graph Gn satisfies a weak isoperimetric inequality and has bounded degree, then the probability p under which G(p) has a giant component of linear order with some constant probability is bounded away from zero and one. In addition, we prove the probability of abnormally large order of the giant component decays exponentially. When a contact graph is modeled as Gn, our result is of special interest in the study of the spread of infectious diseases or the identification of community in various social networks.

Topological Queries on Graph-structured XML Data: Models and Implementations

In many applications, data is in graph structure, which can be naturally represented as graph-structured XML. Existing queries defined on tree-structured and graph-structured XML data mainly focus on subgraph matching, which can not cover all the requirements of querying on graph. In this paper, a new kind of queries, topological query on graph-structured XML is presented. This kind of queries consider not only the structure of subgraph but also the topological relationship between subgraphs. With existing subgraph query processing algorithms, efficient algorithms for topological query processing are designed. Experimental results show the efficiency of implementation algorithms.

A Neighborhood Condition for Fractional k-deleted Graphs

Abstract–Let k ≥ 3 be an integer, and let G be a graph of order n with n ≥ 9k +3- 42(k - 1)2 + 2. Then a spanning subgraph F of G is called a k-factor if dF (x) = k for each x ∈ V (G). A fractional k-factor is a way of assigning weights to the edges of a graph G (with all weights between 0 and 1) such that for each vertex the sum of the weights of the edges incident with that vertex is k. A graph G is a fractional k-deleted graph if there exists a fractional k-factor after deleting any edge of G. In this paper, it is proved that G is a fractional k-deleted graph if G satisfies δ(G) ≥ k + 1 and |NG(x) ∪ NG(y)| ≥ 1 2 (n + k - 2) for each pair of nonadjacent vertices x, y of G.

On Fractional (k,m)-Deleted Graphs with Constrains Conditions

Let G be a graph of order n, and let k  2 and m  0 be two integers. Let h : E(G)  [0, 1] be a function. If e∋x h(e) = k holds for each x  V (G), then we call G[Fh] a fractional k-factor of G with indicator function h where Fh = {e  E(G) : h(e) > 0}. A graph G is called a fractional (k,m)-deleted graph if there exists a fractional k-factor G[Fh] of G with indicator function h such that h(e) = 0 for any e  E(H), where H is any subgraph of G with m edges. In this paper, it is proved that G is a fractional (k,m)-deleted graph if (G)  k + m + m k+1 , n  4k2 + 2k − 6 + (4k 2 +6k−2)m−2 k−1 and max{dG(x), dG(y)}  n 2 for any vertices x and y of G with dG(x, y) = 2. Furthermore, it is shown that the result in this paper is best possible in some sense.

Decomposition of Graphs into Induced Paths and Cycles

A decomposition of a graph G is a collection ψ of subgraphs H1,H2, . . . , Hr of G such that every edge of G belongs to exactly one Hi. If each Hi is either an induced path or an induced cycle in G, then ψ is called an induced path decomposition of G. The minimum cardinality of an induced path decomposition of G is called the induced path decomposition number of G and is denoted by πi(G). In this paper we initiate a study of this parameter.

N-Sun Decomposition of Complete Graphs and Complete Bipartite Graphs

Graph decompositions are vital in the study of combinatorial design theory. Given two graphs G and H, an H-decomposition of G is a partition of the edge set of G into disjoint isomorphic copies of H. An n-sun is a cycle Cn with an edge terminating in a vertex of degree one attached to each vertex. In this paper we have proved that the complete graph of order 2n, K2n can be decomposed into n-2 n-suns, a Hamilton cycle and a perfect matching, when n is even and for odd case, the decomposition is n-1 n-suns and a perfect matching. For an odd order complete graph K2n+1, delete the star subgraph K1, 2n and the resultant graph K2n is decomposed as in the case of even order. The method of building n-suns uses Walecki's construction for the Hamilton decomposition of complete graphs. A spanning tree decomposition of even order complete graphs is also discussed using the labeling scheme of n-sun decomposition. A complete bipartite graph Kn, n can be decomposed into n/2 n-suns when n/2 is even. When n/2 is odd, Kn, n can be decomposed into (n-2)/2 n-suns and a Hamilton cycle.

Modeling And Analysis of Simple Open Cycle Gas Turbine Using Graph Networks

This paper presents a unified approach based graph theory and system theory postulates for the modeling and analysis of Simple open cycle Gas turbine system. In the present paper, the simple open cycle gas turbine system has been modeled up to its subsystem level and system variables have been identified to develop the process subgraphs. The theorems and algorithms of the graph theory have been used to represent behavioural properties of the system like rate of heat and work transfers rates, pressure drops and temperature drops in the involved processes of the system. The processes have been represented as edges of the process subgraphs and their limits as the vertices of the process subgraphs. The system across variables and through variables has been used to develop terminal equations of the process subgraphs of the system. The set of equations developed for vertices and edges of network graph are used to solve the system for its process variables.