Abstract: This paper introduces topological order in descried social systems starting with the original concept of autopoiesis by biologists and scientists, including the modification of general systems based on socialized medicine. Topological order is important in describing the physical systems for exploiting optical systems and improving photonic devices. The stats of topologically order have some interesting properties of topological degeneracy and fractional statistics that reveal the entanglement origin of topological order, etc. Topological ideas in photonics form exciting developments in solid-state materials, that being; insulating in the bulk, conducting electricity on their surface without dissipation or back-scattering, even in the presence of large impurities. A specific type of autopoiesis system is interrelated to the main categories amongst existing groups of the ecological phenomena interaction social and medical sciences. The hypothesis, nevertheless, has a nonlinear interaction with its natural environment ‘interactional cycle’ for exchange photon energy with molecules without changes in topology (i.e., chemical transformation into products do not propagate any changes or variation in the network topology of physical configuration). The engineering topology of a biosensor is based on the excitation boundary of surface electromagnetic waves in photonic band gap multilayer films. The device operation is similar to surface Plasmonic biosensors in which a photonic band gap film replaces metal film as the medium when surface electromagnetic waves are excited. The use of photonic band gap film offers sharper surface wave resonance leading to the potential of greatly enhanced sensitivity. So, the properties of the photonic band gap material are engineered to operate a sensor at any wavelength and conduct a surface wave resonance that ranges up to 470 nm. The wavelength is not generally accessible with surface Plasmon sensing. Lastly, the photonic band gap films have robust mechanical functions that offer new substrates for surface chemistry to understand the molecular design structure, and create sensing chips surface with different concentrations of DNA sequences in the solution to observe and track the surface mode resonance under the influences of processes that take place in the spectroscopic environment. These processes led to the development of several advanced analytical technologies, which are automated, real-time, reliable, reproducible and cost-effective. This results in faster and more accurate monitoring and detection of biomolecules on refractive index sensing, antibody–antigen reactions with a DNA or protein binding. Ultimately, the controversial aspect of molecular frictional properties is adjusted to each other in order to form unique spatial structure and dynamics of biological molecules for providing the environment mutual contribution in investigation of changes due the pathogenic archival architecture of cell clusters.
Abstract: Analysis of the human microbiome using metagenomic
sequencing data has demonstrated high ability in discriminating
various human diseases. Raw metagenomic sequencing data require
multiple complex and computationally heavy bioinformatics steps
prior to data analysis. Such data contain millions of short sequences
read from the fragmented DNA sequences and stored as fastq files.
Conventional processing pipelines consist in multiple steps including
quality control, filtering, alignment of sequences against genomic
catalogs (genes, species, taxonomic levels, functional pathways,
etc.). These pipelines are complex to use, time consuming and
rely on a large number of parameters that often provide variability
and impact the estimation of the microbiome elements. Training
Deep Neural Networks directly from raw sequencing data is a
promising approach to bypass some of the challenges associated with
mainstream bioinformatics pipelines. Most of these methods use the
concept of word and sentence embeddings that create a meaningful
and numerical representation of DNA sequences, while extracting
features and reducing the dimensionality of the data. In this paper
we present an end-to-end approach that classifies patients into disease
groups directly from raw metagenomic reads: metagenome2vec. This
approach is composed of four steps (i) generating a vocabulary of
k-mers and learning their numerical embeddings; (ii) learning DNA
sequence (read) embeddings; (iii) identifying the genome from which
the sequence is most likely to come and (iv) training a multiple
instance learning classifier which predicts the phenotype based on
the vector representation of the raw data. An attention mechanism
is applied in the network so that the model can be interpreted,
assigning a weight to the influence of the prediction for each genome.
Using two public real-life data-sets as well a simulated one, we
demonstrated that this original approach reaches high performance,
comparable with the state-of-the-art methods applied directly on
processed data though mainstream bioinformatics workflows. These
results are encouraging for this proof of concept work. We believe
that with further dedication, the DNN models have the potential to
surpass mainstream bioinformatics workflows in disease classification
tasks.
Abstract: Water lily (Nymphaea L.) is the largest genus of Nymphaeaceae. This family is composed of six genera (Nuphar, Ondinea, Euryale, Victoria, Barclaya, Nymphaea). Its members are nearly worldwide in tropical and temperate regions. The classification of some species in Nymphaea is ambiguous due to high variation in leaf and flower parts such as leaf margin, stamen appendage. Therefore, the phylogenetic relationships based on 18S rDNA were constructed to delimit this genus. DNAs of 52 specimens belonging to water lily family were extracted using modified conventional method containing cetyltrimethyl ammonium bromide (CTAB). The results showed that the amplified fragment is about 1600 base pairs in size. After analysis, the aligned sequences presented 9.36% for variable characters comprising 2.66% of parsimonious informative sites and 6.70% of singleton sites. Moreover, there are 6 regions of 1-2 base(s) for insertion/deletion. The phylogenetic trees based on maximum parsimony and maximum likelihood with high bootstrap support indicated that genus Nymphaea was a paraphyletic group because of Ondinea, Victoria and Euryale disruption. Within genus Nymphaea, subgenus Nymphaea is a basal lineage group which cooperated with Euryale and Victoria. The other four subgenera, namely Lotos, Hydrocallis, Brachyceras and Anecphya were included the same large clade which Ondinea was placed within Anecphya clade due to geographical sharing.
Abstract: Noninvasive diagnostics of diseases via breath
analysis has attracted considerable scientific and clinical interest for
many years and become more and more promising with the rapid
advancements in nanotechnology and biotechnology. The volatile
organic compounds (VOCs) in exhaled breath, which are mainly
blood borne, particularly provide highly valuable information about
individuals’ physiological and pathophysiological conditions.
Additionally, breath analysis is noninvasive, real-time, painless, and
agreeable to patients. We have developed a wireless sensor array
based on single-stranded DNA (ssDNA)-functionalized single-walled
carbon nanotubes (SWNT) for the detection of a number of
physiological indicators in breath. Seven DNA sequences were used
to functionalize SWNT sensors to detect trace amount of methanol,
benzene, dimethyl sulfide, hydrogen sulfide, acetone, and ethanol,
which are indicators of heavy smoking, excessive drinking, and
diseases such as lung cancer, breast cancer, and diabetes. Our test
results indicated that DNA functionalized SWNT sensors exhibit
great selectivity, sensitivity, and repeatability; and different
molecules can be distinguished through pattern recognition enabled
by this sensor array. Furthermore, the experimental sensing results
are consistent with the Molecular Dynamics simulated ssDNAmolecular
target interaction rankings. Thus, the DNA-SWNT sensor
array has great potential to be applied in chemical or biomolecular
detection for the noninvasive diagnostics of diseases and personal
health monitoring.
Abstract: Pattern matching is one of the fundamental applications in molecular biology. Searching DNA related data is a common activity for molecular biologists. In this paper we explore the applicability of a new pattern matching technique called Index based Forward Backward Multiple Pattern Matching algorithm(IFBMPM), for DNA Sequences. Our approach avoids unnecessary comparisons in the DNA Sequence due to this; the number of comparisons of the proposed algorithm is very less compared to other existing popular methods. The number of comparisons rapidly decreases and execution time decreases accordingly and shows better performance.
Abstract: A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represent one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the modern genetic code. Our results suggest that the Watson-Crick base pairing and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the later. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences.
Abstract: The evolutionary tree is an important topic in bioinformation. In 2006, Chen and Lindsay proposed a new method to build the mixture tree from DNA sequences. Mixture tree is a new type evolutionary tree, and it has two additional information besides the information of ordinary evolutionary tree. One of the information is time parameter, and the other is the set of mutated sites. In 2008, Lin and Juan proposed an algorithm to compute the distance between two mixture trees. Their algorithm computes the distance with only considering the time parameter between two mixture trees. In this paper, we proposes a method to measure the similarity of two mixture trees with considering the set of mutated sites and develops two algorithm to compute the distance between two mixture trees. The time complexity of these two proposed algorithms are O(n2 × max{h(T1), h(T2)}) and O(n2), respectively
Abstract: Identifying protein coding regions in DNA sequences is a basic step in the location of genes. Several approaches based on signal processing tools have been applied to solve this problem, trying to achieve more accurate predictions. This paper presents a new predictor that improves the efficacy of three techniques that use the Fourier Transform to predict coding regions, and that could be computed using an algorithm that reduces the computation load. Some ideas about the combination of the predictor with other methods are discussed. ROC curves are used to demonstrate the efficacy of the proposed predictor, based on the computation of 25 DNA sequences from three different organisms.
Abstract: In this paper, we propose an efficient hierarchical DNA
sequence search method to improve the search speed while the
accuracy is being kept constant. For a given query DNA sequence,
firstly, a fast local search method using histogram features is used as a
filtering mechanism before scanning the sequences in the database.
An overlapping processing is newly added to improve the robustness
of the algorithm. A large number of DNA sequences with low
similarity will be excluded for latter searching. The Smith-Waterman
algorithm is then applied to each remainder sequences. Experimental
results using GenBank sequence data show the proposed method
combining histogram information and Smith-Waterman algorithm is
more efficient for DNA sequence search.
Abstract: The small interfering RNA (siRNA) alters the
regulatory role of mRNA during gene expression by translational
inhibition. Recent studies show that upregulation of mRNA because
serious diseases like cancer. So designing effective siRNA with good
knockdown effects plays an important role in gene silencing. Various
siRNA design tools had been developed earlier. In this work, we are
trying to analyze the existing good scoring second generation siRNA
predicting tools and to optimize the efficiency of siRNA prediction
by designing a computational model using Artificial Neural Network
and whole stacking energy (%G), which may help in gene silencing
and drug design in cancer therapy. Our model is trained and tested
against a large data set of siRNA sequences. Validation of our results
is done by finding correlation coefficient of experimental versus
observed inhibition efficacy of siRNA. We achieved a correlation
coefficient of 0.727 in our previous computational model and we
could improve the correlation coefficient up to 0.753 when the
threshold of whole tacking energy is greater than or equal to -32.5
kcal/mol.
Abstract: Many digital signal processing, techniques have been used to automatically distinguish protein coding regions (exons) from non-coding regions (introns) in DNA sequences. In this work, we have characterized these sequences according to their nonlinear dynamical features such as moment invariants, correlation dimension, and largest Lyapunov exponent estimates. We have applied our model to a number of real sequences encoded into a time series using EIIP sequence indicators. In order to discriminate between coding and non coding DNA regions, the phase space trajectory was first reconstructed for coding and non-coding regions. Nonlinear dynamical features are extracted from those regions and used to investigate a difference between them. Our results indicate that the nonlinear dynamical characteristics have yielded significant differences between coding (CR) and non-coding regions (NCR) in DNA sequences. Finally, the classifier is tested on real genes where coding and non-coding regions are well known.
Abstract: In biological and biomedical research motif finding tools are important in locating regulatory elements in DNA sequences. There are many such motif finding tools available, which often yield position weight matrices and significance indicators. These indicators, p-values and E-values, describe the likelihood that a motif alignment is generated by the background process, and the expected number of occurrences of the motif in the data set, respectively. The various tools often estimate these indicators differently, making them not directly comparable. One approach for comparing motifs from different tools, is computing the E-value as the product of the p-value and the number of possible alignments in the data set. In this paper we explore the combinatorics of the motif alignment models OOPS, ZOOPS, and ANR, and propose a generic algorithm for computing the number of possible combinations accurately. We also show that using the wrong alignment model can give E-values that significantly diverge from their true values.
Abstract: A number of competing methodologies have been developed
to identify genes and classify DNA sequences into coding
and non-coding sequences. This classification process is fundamental
in gene finding and gene annotation tools and is one of the most
challenging tasks in bioinformatics and computational biology. An
information theory measure based on mutual information has shown
good accuracy in classifying DNA sequences into coding and noncoding.
In this paper we describe a species independent iterative
approach that distinguishes coding from non-coding sequences using
the mutual information measure (MIM). A set of sixty prokaryotes is
used to extract universal training data. To facilitate comparisons with
the published results of other researchers, a test set of 51 bacterial
and archaeal genomes was used to evaluate MIM. These results
demonstrate that MIM produces superior results while remaining
species independent.
Abstract: SeqWord Gene Island Sniffer, a new program for
the identification of mobile genetic elements in sequences of bacterial chromosomes is presented. This program is based on the
analysis of oligonucleotide usage variations in DNA sequences. 3,518 mobile genetic elements were identified in 637 bacterial
genomes and further analyzed by sequence similarity and the
functionality of encoded proteins. The results of this study are stored in an open database http://anjie.bi.up.ac.za/geidb/geidbhome.
php). The developed computer program and the database provide the information valuable for further investigation of the
distribution of mobile genetic elements and virulence factors among bacteria. The program is available for download at www.bi.up.ac.za/SeqWord/sniffer/index.html.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: Approximate tandem repeats in a genomic sequence are
two or more contiguous, similar copies of a pattern of nucleotides.
They are used in DNA mapping, studying molecular evolution
mechanisms, forensic analysis and research in diagnosis of inherited
diseases. All their functions are still investigated and not well
defined, but increasing biological databases together with tools for
identification of these repeats may lead to discovery of their specific
role or correlation with particular features. This paper presents a new
approach for finding approximate tandem repeats in a given sequence,
where the similarity between consecutive repeats is measured using
the Hamming distance. It is an enhancement of a method for finding
exact tandem repeats in DNA sequences based on the Burrows-
Wheeler transform.
Abstract: In This Article We establish moment inequality of
dependent random variables,furthermore some theorems of strong law
of large numbers and complete convergence for sequences of dependent
random variables. In particular, independent and identically
distributed Marcinkiewicz Law of large numbers are generalized to
the case of m0-dependent sequences.