Abstract: In the paper we submit the modification of kinetic Smoluchowski equation for binary aggregation applying to systems with chemical reactions of first and second orders in which the main product is insoluble. The goal of this work is to create theoretical foundation and engineering procedures for calculating the chemical apparatuses in the conditions of joint course of chemical reactions and processes of aggregation of insoluble dispersed phases which are formed in working zones of the reactor.
Abstract: Various cis-regulatory module (CRM) predictors have been proposed in the last decade. Several well-established CRM predictors adopted different categories of prediction strategies, including window clustering, probabilistic modeling and phylogenetic footprinting. Appropriate integration of them has a potential to achieve high quality CRM prediction. This study analyzed four existing CRM predictors (ClusterBuster, MSCAN, CisModule and MultiModule) to seek a predictor combination that delivers a higher accuracy than individual CRM predictors. 465 CRMs across 140 Drosophila melanogaster genes from the RED fly database were used to evaluate the integrated CRM predictor proposed in this study. The results show that four predictor combinations achieved superior performance than the best individual CRM predictor.
Abstract: We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere Sn-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.
Abstract: This paper gives a consideration of the achievement of productive level parallel programming skills, based on the data of the graduation studies in the Polytechnic University of Japan. The data show that most students can achieve only parallel programming skills during the graduation study (about 600 to 700 hours), if the programming environment is limited to GPGPUs. However, the data also show that it is a very high level task that a student achieves productive level parallel programming skills during only the graduation study. In addition, it shows that the parallel programming environments for GPGPU, such as CUDA and OpenCL, may be more suitable for parallel computing education than other environments such as MPI on a cluster system and Cell.B.E. These results must be useful for the areas of not only software developments, but also hardware product developments using computer technologies.
Abstract: Support vector clustering (SVC) is an important kernelbased clustering algorithm in multi applications. It has got two main bottle necks, the high computation price and labeling piece. In this paper, we presented a modified SVC method, named Grid–SVC, to improve the original algorithm computationally. First we normalized and then we parted the interval, where the SVC is processing, using a novel Grid–based clustering algorithm. The algorithm parts the intervals, based on the density function of the data set and then applying the cartesian multiply makes multi-dimensional grids. Eliminating many outliers and noise in the preprocess, we apply an improved SVC method to each parted grid in a parallel way. The experimental results show both improvement in time complexity order and the accuracy.
Abstract: Breast Cancer is the most common malignancy in women and the second leading cause of death for women all over the world. Earlier the detection of cancer, better the treatment. The diagnosis and treatment of the cancer rely on segmentation of Sonoelastographic images. Texture features has not considered for Sonoelastographic segmentation. Sonoelastographic images of 15 patients containing both benign and malignant tumorsare considered for experimentation.The images are enhanced to remove noise in order to improve contrast and emphasize tumor boundary. It is then decomposed into sub-bands using single level Daubechies wavelets varying from single co-efficient to six coefficients. The Grey Level Co-occurrence Matrix (GLCM), Local Binary Pattern (LBP) features are extracted and then selected by ranking it using Sequential Floating Forward Selection (SFFS) technique from each sub-band. The resultant images undergo K-Means clustering and then few post-processing steps to remove the false spots. The tumor boundary is detected from the segmented image. It is proposed that Local Binary Pattern (LBP) from the vertical coefficients of Daubechies wavelet with two coefficients is best suited for segmentation of Sonoelastographic breast images among the wavelet members using one to six coefficients for decomposition. The results are also quantified with the help of an expert radiologist. The proposed work can be used for further diagnostic process to decide if the segmented tumor is benign or malignant.
Abstract: Educational data mining is a specific data mining field applied to data originating from educational environments, it relies on different approaches to discover hidden knowledge from the available data. Among these approaches are machine learning techniques which are used to build a system that acquires learning from previous data. Machine learning can be applied to solve different regression, classification, clustering and optimization problems.
In our research, we propose a “Student Advisory Framework” that utilizes classification and clustering to build an intelligent system. This system can be used to provide pieces of consultations to a first year university student to pursue a certain education track where he/she will likely succeed in, aiming to decrease the high rate of academic failure among these students. A real case study in Cairo Higher Institute for Engineering, Computer Science and Management is presented using real dataset collected from 2000−2012.The dataset has two main components: pre-higher education dataset and first year courses results dataset. Results have proved the efficiency of the suggested framework.
Abstract: Reverse engineering of genetic regulatory network involves the modeling of the given gene expression data into a form of the network. Computationally it is possible to have the relationships between genes, so called gene regulatory networks (GRNs), that can help to find the genomics and proteomics based diagnostic approach for any disease. In this paper, clustering based method has been used to reconstruct genetic regulatory network from time series gene expression data. Supercoiled data set from Escherichia coli has been taken to demonstrate the proposed method.
Abstract: The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.
This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.
It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.
The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.
Abstract: In order to integrate knowledge in heterogeneous
case-based reasoning (CBR) systems, ontology-based CBR system
has become a hot topic. To solve the facing problems of
ontology-based CBR system, for example, its architecture is
nonstandard, reusing knowledge in legacy CBR is deficient, ontology
construction is difficult, etc, we propose a novel approach for
semi-automatically construct ontology-based CBR system whose
architecture is based on two-layer ontology. Domain knowledge
implied in legacy case bases can be mapped from relational database
schema and knowledge items to relevant OWL local ontology
automatically by a mapping algorithm with low time-complexity. By
concept clustering based on formal concept analysis, computing
concept equation measure and concept inclusion measure, some
suggestions about enriching or amending concept hierarchy of OWL
local ontologies are made automatically that can aid designers to
achieve semi-automatic construction of OWL domain ontology.
Validation of the approach is done by an application example.
Abstract: Absorption spectra of infra-red (IR) radiation of the
disperse water medium absorbing the most important greenhouse
gases: CO2 , N2O , CH4 , C2H2 , C2H6 have been calculated by
the molecular dynamics method. Loss of the absorbing ability at the
formation of clusters due to a reduction of the number of centers
interacting with IR radiation, results in an anti-greenhouse effect.
Absorption of O3 molecules by the (H2O)50 cluster is investigated
at its interaction with Cl- ions. The splitting of ozone molecule on
atoms near to cluster surface was observed. Interaction of water
cluster with Cl- ions causes the increase of integrated intensity of
emission spectra of IR radiation, and also essential reduction of the
similar characteristic of Raman spectrum. Relative integrated
intensity of absorption of IR radiation for small water clusters was
designed. Dependences of the quantity of weight on altitude for
vapor of monomers, clusters, droplets, crystals and mass of all
moisture were determined. The anti-greenhouse effect of clusters was
defined as the difference of increases of average global temperature
of the Earth, caused by absorption of IR radiation by free water
molecules forming clusters, and absorption of clusters themselves.
The greenhouse effect caused by clusters makes 0.53 K, and the antigreenhouse
one is equal to 1.14 K. The increase of concentration of
CO2 in the atmosphere does not always correlate with the
amplification of greenhouse effect.
Abstract: Short-Term Load Forecasting (STLF) plays an important role for the economic and secure operation of power systems. In this paper, Continuous Genetic Algorithm (CGA) is employed to evolve the optimum large neural networks structure and connecting weights for one-day ahead electric load forecasting problem. This study describes the process of developing three layer feed-forward large neural networks for load forecasting and then presents a heuristic search algorithm for performing an important task of this process, i.e. optimal networks structure design. The proposed method is applied to STLF of the local utility. Data are clustered due to the differences in their characteristics. Special days are extracted from the normal training sets and handled separately. In this way, a solution is provided for all load types, including working days and weekends and special days. We find good performance for the large neural networks. The proposed methodology gives lower percent errors all the time. Thus, it can be applied to automatically design an optimal load forecaster based on historical data.
Abstract: Certain sciences such as physics, chemistry or biology,
have a strong computational aspect and use computing infrastructures
to advance their scientific goals. Often, high performance and/or high
throughput computing infrastructures such as clusters and computational
Grids are applied to satisfy computational needs. In addition,
these sciences are sometimes characterised by scientific collaborations
requiring resource sharing which is typically provided by Grid
approaches. In this article, I discuss Grid computing approaches in
High Energy Physics as well as in bioinformatics and highlight some
of my experience in both scientific domains.
Abstract: Debates on residential satisfaction topic have been
vigorously discussed in family house setting. Nonetheless, less or
lack of attention was given to survey on student residential
satisfaction in the campus house setting. This study, however, tried to
fill in the gap by focusing more on the relationship between students-
socio-economic backgrounds and student residential satisfaction with
their on-campus student housing facilities. Two-stage cluster
sampling method was employed to classify the respondents. Then,
self-administered questionnaires were distributed face-to-face to the
students. In general, it was confirmed that the students- socioeconomic
backgrounds have significantly influence the students-
satisfaction with their on-campus student housing facilities. The main
influential factors were revealed as the economic status, sense of
sharing, and the ethnicity of roommates. Likewise, this study could
also provide some useful feedback for the universities administration
in order to improve their student housing facilities.
Abstract: The purpose of this study was to investigate the religious behavior of students in high school and universality in Lamerd , a town in the south of Iran, with respect to increase in their level of education and age. The participants were 450 high school and university students in all levels from first year of junior high school
to the senior university students who were chosen through multistage
cluster sampling method and their religious behavior was
studied. Through the revised questionnaire by Nezar Alany from the University of Bahrain (r = 0/797), the religious behavior of the subjects were analyzed. Results showed that students in high school
in religious behavior were superior to the students of university (003/0>p) and there was a decline of religious behavior in junior high school third year students to second students of the same school
(042/0>p). More important is that the decrease in religious behavior was associated with increase in educational levels (017/0>p) and age (043/0>p).
Abstract: The goal of a network-based intrusion detection
system is to classify activities of network traffics into two major
categories: normal and attack (intrusive) activities. Nowadays, data
mining and machine learning plays an important role in many
sciences; including intrusion detection system (IDS) using both
supervised and unsupervised techniques. However, one of the
essential steps of data mining is feature selection that helps in
improving the efficiency, performance and prediction rate of
proposed approach. This paper applies unsupervised K-means
clustering algorithm with information gain (IG) for feature selection
and reduction to build a network intrusion detection system. For our
experimental analysis, we have used the new NSL-KDD dataset,
which is a modified dataset for KDDCup 1999 intrusion detection
benchmark dataset. With a split of 60.0% for the training set and the
remainder for the testing set, a 2 class classifications have been
implemented (Normal, Attack). Weka framework which is a java
based open source software consists of a collection of machine
learning algorithms for data mining tasks has been used in the testing
process. The experimental results show that the proposed approach is
very accurate with low false positive rate and high true positive rate
and it takes less learning time in comparison with using the full
features of the dataset with the same algorithm.
Abstract: In Content-Based Image Retrieval systems it is
important to use an efficient indexing technique in order to perform
and accelerate the search in huge databases. The used indexing
technique should also support the high dimensions of image features.
In this paper we present the hierarchical index NOHIS-tree (Non
Overlapping Hierarchical Index Structure) when we scale up to very
large databases. We also present a study of the influence of clustering
on search time. The performance test results show that NOHIS-tree
performs better than SR-tree. Tests also show that NOHIS-tree keeps
its performances in high dimensional spaces. We include the
performance test that try to determine the number of clusters in
NOHIS-tree to have the best search time.
Abstract: The DNA microarray technology concurrently monitors the expression levels of thousands of genes during significant biological processes and across the related samples. The better understanding of functional genomics is obtained by extracting the patterns hidden in gene expression data. It is handled by clustering which reveals natural structures and identify interesting patterns in the underlying data. In the proposed work clustering gene expression data is done through an Advanced Nelder Mead (ANM) algorithm. Nelder Mead (NM) method is a method designed for optimization process. In Nelder Mead method, the vertices of a triangle are considered as the solutions. Many operations are performed on this triangle to obtain a better result. In the proposed work, the operations like reflection and expansion is eliminated and a new operation called spread-out is introduced. The spread-out operation will increase the global search area and thus provides a better result on optimization. The spread-out operation will give three points and the best among these three points will be used to replace the worst point. The experiment results are analyzed with optimization benchmark test functions and gene expression benchmark datasets. The results show that ANM outperforms NM in both benchmarks.
Abstract: Mammalian genomes contain large number of
retroelements (SINEs, LINEs and LTRs) which could affect
expression of protein coding genes through associated transcription
factor binding sites (TFBS). Activity of the retroelement-associated
TFBS in many genes is confirmed experimentally but their global
functional impact remains unclear. Human SINEs (Alu repeats) and
mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich
gene rich genome segments consistent with the view that they
can contribute to regulation of gene expression. We have shown
earlier that Alu are involved in formation of cis-regulatory modules
(clusters of TFBS) in human promoters, and other authors reported
that Alu located near promoter CpG islands have an increased
frequency of CpG dinucleotides suggesting that these Alu are
undermethylated. Human Alu and mouse B1/B2 elements have an
internal bipartite promoter for RNA polymerase III containing
conserved sequence motif called B-box which can bind basal
transcription complex TFIIIC. It has been recently shown that TFIIIC
binding to B-box leads to formation of a boundary which limits
spread of repressive chromatin modifications in S. pombe. SINEassociated
B-boxes may have similar function but conservation of
TFIIIC binding sites in SINEs located near mammalian promoters
has not been studied earlier. Here we analysed abundance and
distribution of retroelements (SINEs, LINEs and LTRs) in annotated
sequences of the Database of mammalian transcription start sites
(DBTSS). Fractions of SINEs in human and mouse promoters are
slightly lower than in all genome but >40% of human and mouse
promoters contain Alu or B1/B2 elements within -1000 to +200 bp
interval relative to transcription start site (TSS). Most of these SINEs
is associated with distal segments of promoters (-1000 to -200 bp
relative to TSS) indicating that their insertion at distances >200 bp
upstream of TSS is tolerated during evolution. Distribution of SINEs
in promoters correlates negatively with the distribution of CpG
sequences. Using analysis of abundance of 12-mer motifs from the
B1 and Alu consensus sequences in genome and DBTSS it has been
confirmed that some subsegments of Alu and B1 elements are poorly
conserved which depends in part on the presence of CpG
dinucleotides. One of these CpG-containing subsegments in B1
elements overlaps with SINE-associated B-box and it shows better
conservation in DBTSS compared to genomic sequences. It has been
also studied conservation in DBTSS and genome of the B-box
containing segments of old (AluJ, AluS) and young (AluY) Alu
repeats and found that CpG sequence of the B-box of old Alu is
better conserved in DBTSS than in genome. This indicates that Bbox-
associated CpGs in promoters are better protected from
methylation and mutation than B-box-associated CpGs in genomic
SINEs. These results are consistent with the view that potential
TFIIIC binding motifs in SINEs associated with human and mouse
promoters may be functionally important. These motifs may protect
promoters from repressive histone modifications which spread from
adjacent sequences. This can potentially explain well known
clustering of SINEs in GC-rich gene rich genome compartments and
existence of unmethylated CpG islands.
Abstract: This paper presents an optimal and unsupervised satellite image segmentation approach based on Pearson system and k-Means Clustering Algorithm Initialization. Such method could be considered as original by the fact that it utilised K-Means clustering algorithm for an optimal initialisation of image class number on one hand and it exploited Pearson system for an optimal statistical distributions- affectation of each considered class on the other hand. Satellite image exploitation requires the use of different approaches, especially those founded on the unsupervised statistical segmentation principle. Such approaches necessitate definition of several parameters like image class number, class variables- estimation and generalised mixture distributions. Use of statistical images- attributes assured convincing and promoting results under the condition of having an optimal initialisation step with appropriated statistical distributions- affectation. Pearson system associated with a k-means clustering algorithm and Stochastic Expectation-Maximization 'SEM' algorithm could be adapted to such problem. For each image-s class, Pearson system attributes one distribution type according to different parameters and especially the Skewness 'β1' and the kurtosis 'β2'. The different adapted algorithms, K-Means clustering algorithm, SEM algorithm and Pearson system algorithm, are then applied to satellite image segmentation problem. Efficiency of those combined algorithms was firstly validated with the Mean Quadratic Error 'MQE' evaluation, and secondly with visual inspection along several comparisons of these unsupervised images- segmentation.