Abstract: MicroRNAs (miRNAs), a class of approximately 22 nucleotide long non coding RNAs which play critical role in different biological processes. The mature microRNA is usually 19–27 nucleotides long and is derived from a bigger precursor that folds into a flawed stem-loop structure. Mature micro RNAs are involved in many cellular processes that encompass development, proliferation, stress response, apoptosis, and fat metabolism by gene regulation. Resent finding reveals that certain viruses encode their own miRNA that processed by cellular RNAi machinery. In recent research indicate that cellular microRNA can target the genetic material of invading viruses. Cellular microRNA can be used in the virus life cycle; either to up regulate or down regulate viral gene expression Computational tools use in miRNA target prediction has been changing drastically in recent years. Many of the methods have been made available on the web and can be used by experimental researcher and scientist without expert knowledge of bioinformatics. With the development and ease of use of genomic technologies and computational tools in the field of microRNA biology has superior tremendously over the previous decade. This review attempts to give an overview over the genome wide approaches that have allow for the discovery of new miRNAs and development of new miRNA target prediction tools and databases.
Abstract: The low level of foreign genes expression in transgenic plants is a key factor that limits plant genetic engineering. Because of the critical regulatory activity of the promoters on gene transcription, they are studied extensively to improve the efficiency
of the plant transgenic system. The strong constitutive promoters, such as CaMV 35S promoter and Ubiqutin 1 maize are usually used in plant biotechnology research. However the expression level of the foreign genes in all tissues is often undesirable. But using a strong seed-specific promoter to limit gene expression in the seed solves such problems. The purpose of this study is to isolate one of the seed specific promoters of Hordeum vulgare. So one of the common varieties of Hordeum vulgare in Iran was selected and their genomes extracted then the D-Hordein promoter amplified using the specific designed primers. Then the amplified fragment of the insert cloned in an appropriate vector and then transformed to E. coli. At last for the
final admission of accuracy the cloned fragments sent for sequencing.
Sequencing analysis showed that the cloned fragment DHPcontained motifs; like TATA box, CAAT-box, CCGTCC-box,
AMYBOX1 and E-box etc., which constituted the seed-specific promoter activity. The results were compared with sequences existing in data banks. D-Hordein promoters of Alger has 99% similarity at 100 % coverage. The results also showed that D-Hordein promoter of barley and HMW promoter of wheat are too similar.
Abstract: Triticale is a manmade hybrid of wheat and rye that carries the A and B genome of durum wheat and the R genome of rye. In the scientific literature information about in Latvia harvested organic and conventional triticale grain physically-chemical composition was not found in general. Therefore, the main purpose of the current research was to investigate physically-chemical parameters of in Latvia harvested organic and convectional triticale grains. The research was accomplished on in Year 2012 from State Priekuli Plant Breeding Institute (Latvia) harvested organic and conventional triticale grains: “Dinaro”, “9403-97”, “9405-23” and “9402-3”. In the present research significant differences in chemical composition between organic and conventional triticale grains harvested in Latvia was found. It is necessary to mention that higher 1000 grain weight, bulk density and gluten index was obtained for conventional and organic triticale grain variety “9403-97”. However higher falling number, gluten and protein content was obtained for triticale grain variety “9405-23”.
Abstract: The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.
Abstract: Mammalian genomes contain large number of
retroelements (SINEs, LINEs and LTRs) which could affect
expression of protein coding genes through associated transcription
factor binding sites (TFBS). Activity of the retroelement-associated
TFBS in many genes is confirmed experimentally but their global
functional impact remains unclear. Human SINEs (Alu repeats) and
mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich
gene rich genome segments consistent with the view that they
can contribute to regulation of gene expression. We have shown
earlier that Alu are involved in formation of cis-regulatory modules
(clusters of TFBS) in human promoters, and other authors reported
that Alu located near promoter CpG islands have an increased
frequency of CpG dinucleotides suggesting that these Alu are
undermethylated. Human Alu and mouse B1/B2 elements have an
internal bipartite promoter for RNA polymerase III containing
conserved sequence motif called B-box which can bind basal
transcription complex TFIIIC. It has been recently shown that TFIIIC
binding to B-box leads to formation of a boundary which limits
spread of repressive chromatin modifications in S. pombe. SINEassociated
B-boxes may have similar function but conservation of
TFIIIC binding sites in SINEs located near mammalian promoters
has not been studied earlier. Here we analysed abundance and
distribution of retroelements (SINEs, LINEs and LTRs) in annotated
sequences of the Database of mammalian transcription start sites
(DBTSS). Fractions of SINEs in human and mouse promoters are
slightly lower than in all genome but >40% of human and mouse
promoters contain Alu or B1/B2 elements within -1000 to +200 bp
interval relative to transcription start site (TSS). Most of these SINEs
is associated with distal segments of promoters (-1000 to -200 bp
relative to TSS) indicating that their insertion at distances >200 bp
upstream of TSS is tolerated during evolution. Distribution of SINEs
in promoters correlates negatively with the distribution of CpG
sequences. Using analysis of abundance of 12-mer motifs from the
B1 and Alu consensus sequences in genome and DBTSS it has been
confirmed that some subsegments of Alu and B1 elements are poorly
conserved which depends in part on the presence of CpG
dinucleotides. One of these CpG-containing subsegments in B1
elements overlaps with SINE-associated B-box and it shows better
conservation in DBTSS compared to genomic sequences. It has been
also studied conservation in DBTSS and genome of the B-box
containing segments of old (AluJ, AluS) and young (AluY) Alu
repeats and found that CpG sequence of the B-box of old Alu is
better conserved in DBTSS than in genome. This indicates that Bbox-
associated CpGs in promoters are better protected from
methylation and mutation than B-box-associated CpGs in genomic
SINEs. These results are consistent with the view that potential
TFIIIC binding motifs in SINEs associated with human and mouse
promoters may be functionally important. These motifs may protect
promoters from repressive histone modifications which spread from
adjacent sequences. This can potentially explain well known
clustering of SINEs in GC-rich gene rich genome compartments and
existence of unmethylated CpG islands.
Abstract: Integrins are a large family of multidomain α/β cell
signaling receptors. Some integrins contain an additional inserted I
domain, whose earliest expression appears to be with the chordates,
since they are observed in the urochordates Ciona intestinalis (vase
tunicate) and Halocynthia roretzi (sea pineapple), but not in integrins
of earlier diverging species. The domain-s presence is viewed as a
hallmark of integrins of higher metazoans, however in vertebrates,
there are clearly three structurally-different classes: integrins without
I domains, and two groups of integrins with I domains but separable
by the presence or absence of an additional αC helix. For example,
the αI domains in collagen-binding integrins from Osteichthyes
(bony fish) and all higher vertebrates contain the specific αC helix,
whereas the αI domains in non-collagen binding integrins from
vertebrates and the αI domains from earlier diverging urochordate
integrins, i.e. tunicates, do not. Unfortunately, within the early
chordates, there is an evolutionary gap due to extinctions between the
tunicates and cartilaginous fish. This, coupled with a knowledge gap
due to the lack of complete genomic data from surviving species,
means that the origin of collagen-binding αC-containing αI domains
remains unknown. Here, we analyzed two available genomes from
Callorhinchus milii (ghost shark/elephant shark; Chondrichthyes –
cartilaginous fish) and Petromyzon marinus (sea lamprey;
Agnathostomata), and several available Expression Sequence Tags
from two Chondrichthyes species: Raja erinacea (little skate) and
Squalus acanthias (dogfish shark); and Eptatretus burgeri (inshore
hagfish; Agnathostomata), which evolutionary reside between the
urochordates and osteichthyes. In P. marinus, we observed several
fragments coding for the αC-containing αI domain, allowing us to
shed more light on the evolution of the collagen-binding integrins.
Abstract: HIV-1 genome is highly heterogeneous. Due to this
variation, features of HIV-I genome is in a wide range. For this
reason, the ability to infection of the virus changes depending on
different chemokine receptors. From this point of view, R5 HIV
viruses use CCR5 coreceptor while X4 viruses use CXCR5 and
R5X4 viruses can utilize both coreceptors. Recently, in
Bioinformatics, R5X4 viruses have been studied to classify by using
the experiments on HIV-1 genome.
In this study, R5X4 type of HIV viruses were classified using
Auto Regressive (AR) model through Artificial Neural Networks
(ANNs). The statistical data of R5X4, R5 and X4 viruses was
analyzed by using signal processing methods and ANNs. Accessible
residues of these virus sequences were obtained and modeled by AR
model since the dimension of residues is large and different from
each other. Finally the pre-processed data was used to evolve various
ANN structures for determining R5X4 viruses. Furthermore ROC
analysis was applied to ANNs to show their real performances. The
results indicate that R5X4 viruses successfully classified with high
sensitivity and specificity values training and testing ROC analysis
for RBF, which gives the best performance among ANN structures.
Abstract: A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represent one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the modern genetic code. Our results suggest that the Watson-Crick base pairing and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the later. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences.
Abstract: Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.
Abstract: Segmentation, filtering out of measurement errors and
identification of breakpoints are integral parts of any analysis of
microarray data for the detection of copy number variation (CNV).
Existing algorithms designed for these tasks have had some successes
in the past, but they tend to be O(N2) in either computation time or
memory requirement, or both, and the rapid advance of microarray
resolution has practically rendered such algorithms useless. Here we
propose an algorithm, SAD, that is much faster and much less thirsty
for memory – O(N) in both computation time and memory requirement
-- and offers higher accuracy. The two key ingredients of SAD are the
fundamental assumption in statistics that measurement errors are
normally distributed and the mathematical relation that the product of
two Gaussians is another Gaussian (function). We have produced a
computer program for analyzing CNV based on SAD. In addition to
being fast and small it offers two important features: quantitative
statistics for predictions and, with only two user-decided parameters,
ease of use. Its speed shows little dependence on genomic profile.
Running on an average modern computer, it completes CNV analyses
for a 262 thousand-probe array in ~1 second and a 1.8 million-probe
array in 9 seconds
Abstract: There is strong evidence that water channel proteins
'aquaporins (AQPs)' are central components in plant-water relations
as well as a number of other physiological parameters. We had
previously reported the isolation of 24 plasma membrane intrinsic
protein (PIP) type AQPs. However, the gene numbers in rice and the
polyploid nature of bread wheat indicated a high probability of
further genes in the latter. The present work focused on identification
of further AQP isoforms in bread wheat. With the use of altered
primer design, we identified five genes homologous, designated
PIP1;5b, PIP2;9b, TaPIP2;2, TaPIP2;2a, TaPIP2;2b. Sequence
alignments indicate PIP1;5b, PIP2;9b are likely to be homeologues of
two previously reported genes while the other three are new genes
and could be homeologs of each other. The results indicate further
AQP diversity in wheat and the sequence data will enable physical
mapping of these genes to identify their genomes as well as genetic to
determine their association with any quantitative trait loci (QTLs)
associated with plant-water relation such as salinity or drought
tolerance.
Abstract: Human genome is not only the evolutionary
summation of all advantageous events, but also houses lesions of
deleterious foot prints. A single gene mutation sometimes may
express multiple consequences in numerous tissues and a linear
relationship of the genotype and the phenotype may often be obscure.
ß Thalassemia minor, a transfusion independent mild anaemia,
coupled with environment among other factors may articulate into
phenotypic pleotropy with Hypocholesterolemia, Vitamin D
deficiency, Tissue hypoxia, Hyper-parathyroidism and Psychological
alterations. Occurrence of Pancreatic insufficiency, resultant
steatorrhoea, Vitamin-D (25-OH) deficiency (13.86 ngm/ml) with
Hypocholesterolemia (85mg/dl) in a 30 years old male ß Thal-minor
patient (Hemoglobin 11mg/dl with Fetal Hemoglobin 2.10%, Hb A2
4.60% and Hb Adult 84.80% and altered Hemogram) with increased
Para thyroid hormone (62 pg/ml) & moderate Serum Ca+2
(9.5mg/ml) indicate towards a cascade of phenotypic pleotropy
where the ß Thalassemia mutation ,be it in the 5’ cap site of the
mRNA , differential splicing etc in heterozygous state is effecting
several metabolic pathways. Compensatory extramedulary
hematopoiesis may not coped up well with the stressful life style of
the young individual and increased erythropoietic stress with high
demand for cholesterol for RBC membrane synthesis may have
resulted in Hypocholesterolemia.Oxidative stress and tissue hypoxia
may have caused the pancreatic insufficiency, leading to Vitamin D
deficiency. This may in turn have caused the secondary
hyperparathyroidism to sustain serum Calcium level. Irritability and
stress intolerance of the patient was a cumulative effect of the vicious
cycle of metabolic compromises. From these findings we propose
that the metabolic deficiencies in the ß Thalassemia mutations may
be considered as the phenotypic display of the pleotropy to explain
the genetic epidemiology.
According to the recommendations from the NIH Workshop on
Gene-Environment Interplay in Common Complex Diseases: Forging
an Integrative Model, study design of observations should be
informed by gene-environment hypotheses and results of a study
(genetic diseases) should be published to inform future hypotheses.
Variety of approaches is needed to capture data on all possible
aspects, each of which is likely to contribute to the etiology of
disease. Speakers also agreed that there is a need for development of
new statistical methods and measurement tools to appraise
information that may be missed out by conventional method where
large sample size is needed to segregate considerable effect.
A meta analytic cohort study in future may bring about significant
insight on to the title comment.
Abstract: Brassinosteroids (BRs) regulate cell elongation,
vascular differentiation, senescence, and stress responses. BRs signal
through the BES1/BZR1 family of transcription factors, which
regulate hundreds of target genes involved in this pathway. In this
research a comprehensive genome-wide analysis was carried out in
BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus,
Vitis vinifera, Glycin max and Brachypodium distachyon.
Specifications of the desired sequences, dot plot and hydropathy plot
were analyzed in the protein and genome sequences of five plant
species. The maximum amino acid length was attributed to protein
sequence Brdic3g with 374aa and the minimum amino acid length
was attributed to protein sequence Gm7g with 163aa. The maximum
Instability index was attributed to protein sequence AT1G19350
equal with 79.99 and the minimum Instability index was attributed to
protein sequence Gm5g equal with 33.22. Aliphatic index of these
protein sequences ranged from 47.82 to 78.79 in Arabidopsis
thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin
max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in
Cucumis sativus. Overall, data obtained from our investigation
contributes a better understanding of the complexity of the
BES1/BZR1 gene family and provides the first step towards directing
future experimental designs to perform systematic analysis of the
functions of the BES1/BZR1 gene family.
Abstract: In this contribution, the use of a new genetic operator is proposed. The main advantage of using this operator is that it is able to assist the evolution procedure to converge faster towards the optimal solution of a problem. This new genetic operator is called ''intuition'' operator. Generally speaking, one can claim that this operator is a way to include any heuristic or any other local knowledge, concerning the problem, that cannot be embedded in the fitness function. Simulation results show that the use of this operator increases significantly the performance of the classic Genetic Algorithm by increasing the convergence speed of its population.
Abstract: A number of competing methodologies have been developed
to identify genes and classify DNA sequences into coding
and non-coding sequences. This classification process is fundamental
in gene finding and gene annotation tools and is one of the most
challenging tasks in bioinformatics and computational biology. An
information theory measure based on mutual information has shown
good accuracy in classifying DNA sequences into coding and noncoding.
In this paper we describe a species independent iterative
approach that distinguishes coding from non-coding sequences using
the mutual information measure (MIM). A set of sixty prokaryotes is
used to extract universal training data. To facilitate comparisons with
the published results of other researchers, a test set of 51 bacterial
and archaeal genomes was used to evaluate MIM. These results
demonstrate that MIM produces superior results while remaining
species independent.
Abstract: The phylogenetic analysis using the most conservative
portions of 18S rRNA gene revealed the phylogenetic relationship
among the two populations where DNA divergence showed that the
nucleotides diversity value were -0.00838 for the Tanjung Dawai,
Kedah and -0.00708 for the Cherating, Pahang populations
respectively. The net nucleotide divergence among populations (Da)
was -0.0073 indicating a low polymorphism among the populations
studied. Total number of mutations in the Tanjung Dawai, Kedah
samples was higher than Cherating, Pahang samples, which are 73 and
59 respectively while shared mutations across the populations were 8,
and reveal the evolutionary in the genome of Malaysian T. gigas. The
tree topology of both populations inferred using Neigbour-joining
method by comparing 1791 bp of partial 18S rRNA sequence revealed
that T. gigas haplotypes were clustered into seven clades, suggesting
that they are genetically diverse among populations which derived
from a common ancestor.
Abstract: Dilated cardiomyopathy (DCM) is a severe
cardiovascular disorder characterized by progressive systolic
dysfunction due to cardiac chamber dilatation and inefficient
myocardial contractility often leading to chronic heart failure.
Recently, a genome-wide association studies (GWASs) on DCM
indicate that the ZBTB17 gene rs10927875 single nucleotide
polymorphism is associated with DCM. The aim of the study was to
identify the distribution of ZBTB17 gene rs10927875 polymorphism
in 50 Slovak patients with DCM and 80 healthy control subjects
using the Custom Taqman®SNP Genotyping assays. Risk factors
detected at baseline in each group included age, sex, body mass
index, smoking status, diabetes and blood pressure. The mean age of
patients with DCM was 52.9±6.3 years; the mean age of individuals
in control group was 50.3±8.9 years. The distribution of investigated
genotypes of rs10927875 polymorphism within ZBTB17 gene in the
cohort of Slovak patients with DCM was as follows: CC (38.8%), CT
(55.1%), TT (6.1%), in controls: CC (43.8%), CT (51.2%), TT
(5.0%). The risk allele T was more common among the patients with
dilated cardiomyopathy than in normal controls (33.7% versus
30.6%). The differences in genotype or allele frequencies of ZBTB17
gene rs10927875 polymorphism were not statistically significant
(p=0.6908; p=0.6098). The results of this study suggest that ZBTB17
gene rs10927875 polymorphism may be a risk factor for
susceptibility to DCM in Slovak patients with DCM. Studies of
numerous files and additional functional investigations are needed to
fully understand the roles of genetic associations.
Abstract: SeqWord Gene Island Sniffer, a new program for
the identification of mobile genetic elements in sequences of bacterial chromosomes is presented. This program is based on the
analysis of oligonucleotide usage variations in DNA sequences. 3,518 mobile genetic elements were identified in 637 bacterial
genomes and further analyzed by sequence similarity and the
functionality of encoded proteins. The results of this study are stored in an open database http://anjie.bi.up.ac.za/geidb/geidbhome.
php). The developed computer program and the database provide the information valuable for further investigation of the
distribution of mobile genetic elements and virulence factors among bacteria. The program is available for download at www.bi.up.ac.za/SeqWord/sniffer/index.html.
Abstract: Microarray data profiles gene expression on a whole
genome scale, therefore, it provides a good way to study associations
between gene expression and occurrence or progression of cancer.
More and more researchers realized that microarray data is helpful
to predict cancer sample. However, the high dimension of gene
expressions is much larger than the sample size, which makes this
task very difficult. Therefore, how to identify the significant genes
causing cancer becomes emergency and also a hot and hard research
topic. Many feature selection algorithms have been proposed in
the past focusing on improving cancer predictive accuracy at the
expense of ignoring the correlations between the features. In this
work, a novel framework (named by SGS) is presented for stable gene
selection and efficient cancer prediction . The proposed framework
first performs clustering algorithm to find the gene groups where
genes in each group have higher correlation coefficient, and then
selects the significant genes in each group with Bayesian Lasso and
important gene groups with group Lasso, and finally builds prediction
model based on the shrinkage gene space with efficient classification
algorithm (such as, SVM, 1NN, Regression and etc.). Experiment
results on real world data show that the proposed framework often
outperforms the existing feature selection and prediction methods,
say SAM, IG and Lasso-type prediction model.
Abstract: The feature of HIV genome is in a wide range because
of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point,
R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to
classify by using the coreceptors of HIV genome.
The aim of this study is to develop the optimal Multilayer
Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer
was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the
signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model
(AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4
viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.