Abstract: In the paper we submit the modification of kinetic Smoluchowski equation for binary aggregation applying to systems with chemical reactions of first and second orders in which the main product is insoluble. The goal of this work is to create theoretical foundation and engineering procedures for calculating the chemical apparatuses in the conditions of joint course of chemical reactions and processes of aggregation of insoluble dispersed phases which are formed in working zones of the reactor.
Abstract: We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere Sn-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.
Abstract: The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster.
This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system.
It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters.
The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.
Abstract: Absorption spectra of infra-red (IR) radiation of the
disperse water medium absorbing the most important greenhouse
gases: CO2 , N2O , CH4 , C2H2 , C2H6 have been calculated by
the molecular dynamics method. Loss of the absorbing ability at the
formation of clusters due to a reduction of the number of centers
interacting with IR radiation, results in an anti-greenhouse effect.
Absorption of O3 molecules by the (H2O)50 cluster is investigated
at its interaction with Cl- ions. The splitting of ozone molecule on
atoms near to cluster surface was observed. Interaction of water
cluster with Cl- ions causes the increase of integrated intensity of
emission spectra of IR radiation, and also essential reduction of the
similar characteristic of Raman spectrum. Relative integrated
intensity of absorption of IR radiation for small water clusters was
designed. Dependences of the quantity of weight on altitude for
vapor of monomers, clusters, droplets, crystals and mass of all
moisture were determined. The anti-greenhouse effect of clusters was
defined as the difference of increases of average global temperature
of the Earth, caused by absorption of IR radiation by free water
molecules forming clusters, and absorption of clusters themselves.
The greenhouse effect caused by clusters makes 0.53 K, and the antigreenhouse
one is equal to 1.14 K. The increase of concentration of
CO2 in the atmosphere does not always correlate with the
amplification of greenhouse effect.
Abstract: Certain sciences such as physics, chemistry or biology,
have a strong computational aspect and use computing infrastructures
to advance their scientific goals. Often, high performance and/or high
throughput computing infrastructures such as clusters and computational
Grids are applied to satisfy computational needs. In addition,
these sciences are sometimes characterised by scientific collaborations
requiring resource sharing which is typically provided by Grid
approaches. In this article, I discuss Grid computing approaches in
High Energy Physics as well as in bioinformatics and highlight some
of my experience in both scientific domains.
Abstract: In Content-Based Image Retrieval systems it is
important to use an efficient indexing technique in order to perform
and accelerate the search in huge databases. The used indexing
technique should also support the high dimensions of image features.
In this paper we present the hierarchical index NOHIS-tree (Non
Overlapping Hierarchical Index Structure) when we scale up to very
large databases. We also present a study of the influence of clustering
on search time. The performance test results show that NOHIS-tree
performs better than SR-tree. Tests also show that NOHIS-tree keeps
its performances in high dimensional spaces. We include the
performance test that try to determine the number of clusters in
NOHIS-tree to have the best search time.
Abstract: Mammalian genomes contain large number of
retroelements (SINEs, LINEs and LTRs) which could affect
expression of protein coding genes through associated transcription
factor binding sites (TFBS). Activity of the retroelement-associated
TFBS in many genes is confirmed experimentally but their global
functional impact remains unclear. Human SINEs (Alu repeats) and
mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich
gene rich genome segments consistent with the view that they
can contribute to regulation of gene expression. We have shown
earlier that Alu are involved in formation of cis-regulatory modules
(clusters of TFBS) in human promoters, and other authors reported
that Alu located near promoter CpG islands have an increased
frequency of CpG dinucleotides suggesting that these Alu are
undermethylated. Human Alu and mouse B1/B2 elements have an
internal bipartite promoter for RNA polymerase III containing
conserved sequence motif called B-box which can bind basal
transcription complex TFIIIC. It has been recently shown that TFIIIC
binding to B-box leads to formation of a boundary which limits
spread of repressive chromatin modifications in S. pombe. SINEassociated
B-boxes may have similar function but conservation of
TFIIIC binding sites in SINEs located near mammalian promoters
has not been studied earlier. Here we analysed abundance and
distribution of retroelements (SINEs, LINEs and LTRs) in annotated
sequences of the Database of mammalian transcription start sites
(DBTSS). Fractions of SINEs in human and mouse promoters are
slightly lower than in all genome but >40% of human and mouse
promoters contain Alu or B1/B2 elements within -1000 to +200 bp
interval relative to transcription start site (TSS). Most of these SINEs
is associated with distal segments of promoters (-1000 to -200 bp
relative to TSS) indicating that their insertion at distances >200 bp
upstream of TSS is tolerated during evolution. Distribution of SINEs
in promoters correlates negatively with the distribution of CpG
sequences. Using analysis of abundance of 12-mer motifs from the
B1 and Alu consensus sequences in genome and DBTSS it has been
confirmed that some subsegments of Alu and B1 elements are poorly
conserved which depends in part on the presence of CpG
dinucleotides. One of these CpG-containing subsegments in B1
elements overlaps with SINE-associated B-box and it shows better
conservation in DBTSS compared to genomic sequences. It has been
also studied conservation in DBTSS and genome of the B-box
containing segments of old (AluJ, AluS) and young (AluY) Alu
repeats and found that CpG sequence of the B-box of old Alu is
better conserved in DBTSS than in genome. This indicates that Bbox-
associated CpGs in promoters are better protected from
methylation and mutation than B-box-associated CpGs in genomic
SINEs. These results are consistent with the view that potential
TFIIIC binding motifs in SINEs associated with human and mouse
promoters may be functionally important. These motifs may protect
promoters from repressive histone modifications which spread from
adjacent sequences. This can potentially explain well known
clustering of SINEs in GC-rich gene rich genome compartments and
existence of unmethylated CpG islands.
Abstract: A computer cluster is a group of tightly coupled
computers that work together closely so that in many respects they
can be viewed as though they are a single computer. The components
of a cluster are commonly, but not always, connected to each other
through fast local area networks. Clusters are usually deployed to
improve performance and/or availability over that provided by a
single computer, while typically being much more cost-effective than
single computers of comparable speed or availability. This paper
proposed the way to implement the Beowulf Cluster in order to
achieve high performance as well as high availability.
Abstract: Wireless sensor network can be applied to both abominable
and military environments. A primary goal in the design of
wireless sensor networks is lifetime maximization, constrained by
the energy capacity of batteries. One well-known method to reduce
energy consumption in such networks is data aggregation. Providing
efcient data aggregation while preserving data privacy is a challenging
problem in wireless sensor networks research. In this paper,
we present privacy-preserving data aggregation scheme for additive
aggregation functions. The Cluster-based Private Data Aggregation
(CPDA)leverages clustering protocol and algebraic properties of
polynomials. It has the advantage of incurring less communication
overhead. The goal of our work is to bridge the gap between
collaborative data collection by wireless sensor networks and data
privacy. We present simulation results of our schemes and compare
their performance to a typical data aggregation scheme TAG, where
no data privacy protection is provided. Results show the efficacy and
efficiency of our schemes.
Abstract: In this paper, we propose a fast and efficient method for drawing very large-scale graph data. The conventional force-directed method proposed by Fruchterman and Rheingold (FR method) is well-known. It defines repulsive forces between every pair of nodes and attractive forces between connected nodes on a edge and calculates corresponding potential energy. An optimal layout is obtained by iteratively updating node positions to minimize the potential energy. Here, the positions of the nodes are updated every global timestep at the same time. In the proposed method, each node has its own individual time and time step, and nodes are updated at different frequencies depending on the local situation. The proposed method is inspired by the hierarchical individual time step method used for the high accuracy calculations for dense particle fields such as star clusters in astrophysical dynamics. Experiments show that the proposed method outperforms the original FR method in both speed and accuracy. We implement the proposed method on the MDGRAPE-3 PCI-X special purpose parallel computer and realize a speed enhancement of several hundred times.
Abstract: Essential hypertension (HTN) usually clusters with other cardiovascular risk factors such as age, overweight, diabetes, insulin resistance and dyslipidemia. The target organ damage (TOD) such as left ventricular hypertrophy, microalbuminuria (MA), acute coronary syndrome (ACS), stroke and cognitive dysfunction takes place early in course of hypertension. Though the prevalence of hypertension is high in India, the relationship between microalbuminuria and target organ damage in hypertension is not well studied. This study aim at detecting MA in essential hypertension and its relation to severity of HTN, duration of HTN, body mass index (BMI), age and TOD such as HTN retinopathy and acute coronary syndrome The present study was done in 100 patients of essential hypertension non diabetics admitted to B.L.D.E.University-s Sri B.M.Patil Medical College, Bijapur, from October 2008 to April 2011. The patients underwent detailed history and clinical examination. Early morning 5 ml of urine sample was collected & MA was estimated by immunoturbidometry method. The relationship of MA with the duration & severity of HTN, BMI, age, sex and TOD's like hypertensive retinopathy, ACS was assessed by univariate analysis. The prevalence of MA in this study was found to be 63 %. In that 42% were male & 21% were female. In this study a significant association between MA and the duration of hypertension (p = 0.036) & (OR =0.438). Longer the duration of hypertension, more possibility of microalbumin in urine. Also there was a significant association between severity of hypertension and MA (p=0.045) and (OR=0.093). MA was positive in 50 (79.4%) patients out of 63, whose blood pressure was >160/100 mm Hg. In this study a significant association between MA and the grades of hypertensive retinopathy (p =0.011) and acute coronary syndrome (p = 0.041) (OR =2.805). Gender and BMI did not pose high risk for MA in this study.The prevalence of MA in essential hypertension is high in this part of the community and MA will increase the risk of developing target organ damage.Early screening of patients with essential hypertension for MA and aggressive management of positive cases might reduce the burden of chronic kidney diseases and cardiovascular diseases in the community.
Abstract: The classical temporal scan statistic is often used to
identify disease clusters. In recent years, this method has become as a
very popular technique and its field of application has been notably
increased. Many bioinformatic problems have been solved with this
technique. In this paper a new scan fuzzy method is proposed. The
behaviors of classic and fuzzy scan techniques are studied with
simulated data. ROC curves are calculated, being demonstrated the
superiority of the fuzzy scan technique.
Abstract: Consider a mass production of HDD arms where
hundreds of CNC machines are used to manufacturer the HDD arms.
According to an overwhelming number of machines and models of
arm, construction of separate control chart for monitoring each HDD
arm model by each machine is not feasible. This research proposed a
strategy to optimize the SPC management on shop floor. The
procedure started from identifying the clusters of the machine with
similar manufacturing performance using clustering technique. The
three way control chart ( I - MR - R ) is then applied to each
clustered group of machine. This proposed research has
advantageous to the manufacturer in terms of not only better
performance of the SPC but also the quality management paradigm.
Abstract: In this paper, the periodic surveillance scheme has
been proposed for any convex region using mobile wireless sensor
nodes. A sensor network typically consists of fixed number of
sensor nodes which report the measurements of sensed data such as
temperature, pressure, humidity, etc., of its immediate proximity
(the area within its sensing range). For the purpose of sensing an
area of interest, there are adequate number of fixed sensor
nodes required to cover the entire region of interest. It implies
that the number of fixed sensor nodes required to cover a given
area will depend on the sensing range of the sensor as well as
deployment strategies employed. It is assumed that the sensors to
be mobile within the region of surveillance, can be mounted on
moving bodies like robots or vehicle. Therefore, in our
scheme, the surveillance time period determines the number of
sensor nodes required to be deployed in the region of interest.
The proposed scheme comprises of three algorithms namely:
Hexagonalization, Clustering, and Scheduling, The first algorithm
partitions the coverage area into fixed sized hexagons that
approximate the sensing range (cell) of individual sensor node.
The clustering algorithm groups the cells into clusters, each of
which will be covered by a single sensor node. The later
determines a schedule for each sensor to serve its respective cluster.
Each sensor node traverses all the cells belonging to the cluster
assigned to it by oscillating between the first and the last cell for
the duration of its life time. Simulation results show that our
scheme provides full coverage within a given period of time using
few sensors with minimum movement, less power consumption,
and relatively less infrastructure cost.
Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.
Abstract: Virtualization and high performance computing have been discussed from a performance perspective in recent publications. We present and discuss a flexible and efficient approach to the management of virtual clusters. A virtual machine management tool is extended to function as a fabric for cluster deployment and management. We show how features such as saving the state of a running cluster can be used to avoid disruption. We also compare our approach to the traditional methods of cluster deployment and present benchmarks which illustrate the efficiency of our approach.
Abstract: Wireless Sensor Network is Multi hop Self-configuring
Wireless Network consisting of sensor nodes. The deployment of
wireless sensor networks in many application areas, e.g., aggregation
services, requires self-organization of the network nodes into clusters.
Efficient way to enhance the lifetime of the system is to partition the
network into distinct clusters with a high energy node as cluster head.
The different methods of node clustering techniques have appeared in
the literature, and roughly fall into two families; those based on the
construction of a dominating set and those which are based solely on
energy considerations. Energy optimized cluster formation for a set
of randomly scattered wireless sensors is presented. Sensors within a
cluster are expected to be communicating with cluster head only. The
energy constraint and limited computing resources of the sensor nodes
present the major challenges in gathering the data. In this paper we
propose a framework to study how partially correlated data affect the
performance of clustering algorithms. The total energy consumption
and network lifetime can be analyzed by combining random geometry
techniques and rate distortion theory. We also present the relation
between compression distortion and data correlation.
Abstract: Clustering categorical data is more complicated than
the numerical clustering because of its special properties. Scalability
and memory constraint is the challenging problem in clustering large
data set. This paper presents an incremental algorithm to cluster the
categorical data. Frequencies of attribute values contribute much in
clustering similar categorical objects. In this paper we propose new
similarity measures based on the frequencies of attribute values and
its cardinalities. The proposed measures and the algorithm are
experimented with the data sets from UCI data repository. Results
prove that the proposed method generates better clusters than the
existing one.
Abstract: An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.
Abstract: This study proposes a new recommender system based on the collaborative folksonomy. The purpose of the proposed system is to recommend Internet resources (such as books, articles, documents, pictures, audio and video) to users. The proposed method includes four steps: creating the user profile based on the tags, grouping the similar users into clusters using an agglomerative hierarchical clustering, finding similar resources based on the user-s past collections by using content-based filtering, and recommending similar items to the target user. This study examines the system-s performance for the dataset collected from “del.icio.us," which is a famous social bookmarking website. Experimental results show that the proposed tag-based collaborative and content-based filtering hybridized recommender system is promising and effectiveness in the folksonomy-based bookmarking website.