Modeling Aggregation of Insoluble Phase in Reactors

In the paper we submit the modification of kinetic Smoluchowski equation for binary aggregation applying to systems with chemical reactions of first and second orders in which the main product is insoluble. The goal of this work is to create theoretical foundation and engineering procedures for calculating the chemical apparatuses in the conditions of joint course of chemical reactions and processes of aggregation of insoluble dispersed phases which are formed in working zones of the reactor.

Clustering of Variables Based On a Probabilistic Approach Defined on the Hypersphere

We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere Sn-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.

Optimizing Hadoop Block Placement Policy and Cluster Blocks Distribution

The current Hadoop block placement policy do not fairly and evenly distributes replicas of blocks written to datanodes in a Hadoop cluster. This paper presents a new solution that helps to keep the cluster in a balanced state while an HDFS client is writing data to a file in Hadoop cluster. The solution had been implemented, and test had been conducted to evaluate its contribution to Hadoop distributed file system. It has been found that, the solution has lowered global execution time taken by Hadoop balancer to 22 percent. It also has been found that, Hadoop balancer respectively over replicate 1.75 and 3.3 percent of all re-distributed blocks in the modified and original Hadoop clusters. The feature that keeps the cluster in a balanced state works as a core part to Hadoop system and not just as a utility like traditional balancer. This is one of the significant achievements and uniqueness of the solution developed during the course of this research work.

Computer Study of Cluster Mechanism of Anti-greenhouse Effect

Absorption spectra of infra-red (IR) radiation of the disperse water medium absorbing the most important greenhouse gases: CO2 , N2O , CH4 , C2H2 , C2H6 have been calculated by the molecular dynamics method. Loss of the absorbing ability at the formation of clusters due to a reduction of the number of centers interacting with IR radiation, results in an anti-greenhouse effect. Absorption of O3 molecules by the (H2O)50 cluster is investigated at its interaction with Cl- ions. The splitting of ozone molecule on atoms near to cluster surface was observed. Interaction of water cluster with Cl- ions causes the increase of integrated intensity of emission spectra of IR radiation, and also essential reduction of the similar characteristic of Raman spectrum. Relative integrated intensity of absorption of IR radiation for small water clusters was designed. Dependences of the quantity of weight on altitude for vapor of monomers, clusters, droplets, crystals and mass of all moisture were determined. The anti-greenhouse effect of clusters was defined as the difference of increases of average global temperature of the Earth, caused by absorption of IR radiation by free water molecules forming clusters, and absorption of clusters themselves. The greenhouse effect caused by clusters makes 0.53 K, and the antigreenhouse one is equal to 1.14 K. The increase of concentration of CO2 in the atmosphere does not always correlate with the amplification of greenhouse effect.

Grid Computing in Physics and Life Sciences

Certain sciences such as physics, chemistry or biology, have a strong computational aspect and use computing infrastructures to advance their scientific goals. Often, high performance and/or high throughput computing infrastructures such as clusters and computational Grids are applied to satisfy computational needs. In addition, these sciences are sometimes characterised by scientific collaborations requiring resource sharing which is typically provided by Grid approaches. In this article, I discuss Grid computing approaches in High Energy Physics as well as in bioinformatics and highlight some of my experience in both scientific domains.

NOHIS-Tree: High-Dimensional Index Structure for Similarity Search

In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. We also present a study of the influence of clustering on search time. The performance test results show that NOHIS-tree performs better than SR-tree. Tests also show that NOHIS-tree keeps its performances in high dimensional spaces. We include the performance test that try to determine the number of clusters in NOHIS-tree to have the best search time.

Bioinformatic Analysis of Retroelement-Associated Sequences in Human and Mouse Promoters

Mammalian genomes contain large number of retroelements (SINEs, LINEs and LTRs) which could affect expression of protein coding genes through associated transcription factor binding sites (TFBS). Activity of the retroelement-associated TFBS in many genes is confirmed experimentally but their global functional impact remains unclear. Human SINEs (Alu repeats) and mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich gene rich genome segments consistent with the view that they can contribute to regulation of gene expression. We have shown earlier that Alu are involved in formation of cis-regulatory modules (clusters of TFBS) in human promoters, and other authors reported that Alu located near promoter CpG islands have an increased frequency of CpG dinucleotides suggesting that these Alu are undermethylated. Human Alu and mouse B1/B2 elements have an internal bipartite promoter for RNA polymerase III containing conserved sequence motif called B-box which can bind basal transcription complex TFIIIC. It has been recently shown that TFIIIC binding to B-box leads to formation of a boundary which limits spread of repressive chromatin modifications in S. pombe. SINEassociated B-boxes may have similar function but conservation of TFIIIC binding sites in SINEs located near mammalian promoters has not been studied earlier. Here we analysed abundance and distribution of retroelements (SINEs, LINEs and LTRs) in annotated sequences of the Database of mammalian transcription start sites (DBTSS). Fractions of SINEs in human and mouse promoters are slightly lower than in all genome but >40% of human and mouse promoters contain Alu or B1/B2 elements within -1000 to +200 bp interval relative to transcription start site (TSS). Most of these SINEs is associated with distal segments of promoters (-1000 to -200 bp relative to TSS) indicating that their insertion at distances >200 bp upstream of TSS is tolerated during evolution. Distribution of SINEs in promoters correlates negatively with the distribution of CpG sequences. Using analysis of abundance of 12-mer motifs from the B1 and Alu consensus sequences in genome and DBTSS it has been confirmed that some subsegments of Alu and B1 elements are poorly conserved which depends in part on the presence of CpG dinucleotides. One of these CpG-containing subsegments in B1 elements overlaps with SINE-associated B-box and it shows better conservation in DBTSS compared to genomic sequences. It has been also studied conservation in DBTSS and genome of the B-box containing segments of old (AluJ, AluS) and young (AluY) Alu repeats and found that CpG sequence of the B-box of old Alu is better conserved in DBTSS than in genome. This indicates that Bbox- associated CpGs in promoters are better protected from methylation and mutation than B-box-associated CpGs in genomic SINEs. These results are consistent with the view that potential TFIIIC binding motifs in SINEs associated with human and mouse promoters may be functionally important. These motifs may protect promoters from repressive histone modifications which spread from adjacent sequences. This can potentially explain well known clustering of SINEs in GC-rich gene rich genome compartments and existence of unmethylated CpG islands.

Achieving High Availability by Implementing Beowulf Cluster

A computer cluster is a group of tightly coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. This paper proposed the way to implement the Beowulf Cluster in order to achieve high performance as well as high availability.

Secure Data Aggregation Using Clusters in Sensor Networks

Wireless sensor network can be applied to both abominable and military environments. A primary goal in the design of wireless sensor networks is lifetime maximization, constrained by the energy capacity of batteries. One well-known method to reduce energy consumption in such networks is data aggregation. Providing efcient data aggregation while preserving data privacy is a challenging problem in wireless sensor networks research. In this paper, we present privacy-preserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA)leverages clustering protocol and algebraic properties of polynomials. It has the advantage of incurring less communication overhead. The goal of our work is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. We present simulation results of our schemes and compare their performance to a typical data aggregation scheme TAG, where no data privacy protection is provided. Results show the efficacy and efficiency of our schemes.

A Force-directed Graph Drawing based on the Hierarchical Individual Timestep Method

In this paper, we propose a fast and efficient method for drawing very large-scale graph data. The conventional force-directed method proposed by Fruchterman and Rheingold (FR method) is well-known. It defines repulsive forces between every pair of nodes and attractive forces between connected nodes on a edge and calculates corresponding potential energy. An optimal layout is obtained by iteratively updating node positions to minimize the potential energy. Here, the positions of the nodes are updated every global timestep at the same time. In the proposed method, each node has its own individual time and time step, and nodes are updated at different frequencies depending on the local situation. The proposed method is inspired by the hierarchical individual time step method used for the high accuracy calculations for dense particle fields such as star clusters in astrophysical dynamics. Experiments show that the proposed method outperforms the original FR method in both speed and accuracy. We implement the proposed method on the MDGRAPE-3 PCI-X special purpose parallel computer and realize a speed enhancement of several hundred times.

Microalbuminuria in Essential Hypertension

Essential hypertension (HTN) usually clusters with other cardiovascular risk factors such as age, overweight, diabetes, insulin resistance and dyslipidemia. The target organ damage (TOD) such as left ventricular hypertrophy, microalbuminuria (MA), acute coronary syndrome (ACS), stroke and cognitive dysfunction takes place early in course of hypertension. Though the prevalence of hypertension is high in India, the relationship between microalbuminuria and target organ damage in hypertension is not well studied. This study aim at detecting MA in essential hypertension and its relation to severity of HTN, duration of HTN, body mass index (BMI), age and TOD such as HTN retinopathy and acute coronary syndrome The present study was done in 100 patients of essential hypertension non diabetics admitted to B.L.D.E.University-s Sri B.M.Patil Medical College, Bijapur, from October 2008 to April 2011. The patients underwent detailed history and clinical examination. Early morning 5 ml of urine sample was collected & MA was estimated by immunoturbidometry method. The relationship of MA with the duration & severity of HTN, BMI, age, sex and TOD's like hypertensive retinopathy, ACS was assessed by univariate analysis. The prevalence of MA in this study was found to be 63 %. In that 42% were male & 21% were female. In this study a significant association between MA and the duration of hypertension (p = 0.036) & (OR =0.438). Longer the duration of hypertension, more possibility of microalbumin in urine. Also there was a significant association between severity of hypertension and MA (p=0.045) and (OR=0.093). MA was positive in 50 (79.4%) patients out of 63, whose blood pressure was >160/100 mm Hg. In this study a significant association between MA and the grades of hypertensive retinopathy (p =0.011) and acute coronary syndrome (p = 0.041) (OR =2.805). Gender and BMI did not pose high risk for MA in this study.The prevalence of MA in essential hypertension is high in this part of the community and MA will increase the risk of developing target organ damage.Early screening of patients with essential hypertension for MA and aggressive management of positive cases might reduce the burden of chronic kidney diseases and cardiovascular diseases in the community.

Fuzzy Scan Method to Detect Clusters

The classical temporal scan statistic is often used to identify disease clusters. In recent years, this method has become as a very popular technique and its field of application has been notably increased. Many bioinformatic problems have been solved with this technique. In this paper a new scan fuzzy method is proposed. The behaviors of classic and fuzzy scan techniques are studied with simulated data. ROC curves are calculated, being demonstrated the superiority of the fuzzy scan technique.

A Strategy to Optimize the SPC Scheme for Mass Production of HDD Arm with ClusteringTechnique and Three-Way Control Chart

Consider a mass production of HDD arms where hundreds of CNC machines are used to manufacturer the HDD arms. According to an overwhelming number of machines and models of arm, construction of separate control chart for monitoring each HDD arm model by each machine is not feasible. This research proposed a strategy to optimize the SPC management on shop floor. The procedure started from identifying the clusters of the machine with similar manufacturing performance using clustering technique. The three way control chart ( I - MR - R ) is then applied to each clustered group of machine. This proposed research has advantageous to the manufacturer in terms of not only better performance of the SPC but also the quality management paradigm.

PoPCoRN: A Power-Aware Periodic Surveillance Scheme in Convex Region using Wireless Mobile Sensor Networks

In this paper, the periodic surveillance scheme has been proposed for any convex region using mobile wireless sensor nodes. A sensor network typically consists of fixed number of sensor nodes which report the measurements of sensed data such as temperature, pressure, humidity, etc., of its immediate proximity (the area within its sensing range). For the purpose of sensing an area of interest, there are adequate number of fixed sensor nodes required to cover the entire region of interest. It implies that the number of fixed sensor nodes required to cover a given area will depend on the sensing range of the sensor as well as deployment strategies employed. It is assumed that the sensors to be mobile within the region of surveillance, can be mounted on moving bodies like robots or vehicle. Therefore, in our scheme, the surveillance time period determines the number of sensor nodes required to be deployed in the region of interest. The proposed scheme comprises of three algorithms namely: Hexagonalization, Clustering, and Scheduling, The first algorithm partitions the coverage area into fixed sized hexagons that approximate the sensing range (cell) of individual sensor node. The clustering algorithm groups the cells into clusters, each of which will be covered by a single sensor node. The later determines a schedule for each sensor to serve its respective cluster. Each sensor node traverses all the cells belonging to the cluster assigned to it by oscillating between the first and the last cell for the duration of its life time. Simulation results show that our scheme provides full coverage within a given period of time using few sensors with minimum movement, less power consumption, and relatively less infrastructure cost.

Network Anomaly Detection using Soft Computing

One main drawback of intrusion detection system is the inability of detecting new attacks which do not have known signatures. In this paper we discuss an intrusion detection method that proposes independent component analysis (ICA) based feature selection heuristics and using rough fuzzy for clustering data. ICA is to separate these independent components (ICs) from the monitored variables. Rough set has to decrease the amount of data and get rid of redundancy and Fuzzy methods allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining- (KDDCup 1999) dataset.

Scalable Deployment and Configuration of High-Performance Virtual Clusters

Virtualization and high performance computing have been discussed from a performance perspective in recent publications. We present and discuss a flexible and efficient approach to the management of virtual clusters. A virtual machine management tool is extended to function as a fabric for cluster deployment and management. We show how features such as saving the state of a running cluster can be used to avoid disruption. We also compare our approach to the traditional methods of cluster deployment and present benchmarks which illustrate the efficiency of our approach.

Effect of Clustering on Energy Efficiency and Network Lifetime in Wireless Sensor Networks

Wireless Sensor Network is Multi hop Self-configuring Wireless Network consisting of sensor nodes. The deployment of wireless sensor networks in many application areas, e.g., aggregation services, requires self-organization of the network nodes into clusters. Efficient way to enhance the lifetime of the system is to partition the network into distinct clusters with a high energy node as cluster head. The different methods of node clustering techniques have appeared in the literature, and roughly fall into two families; those based on the construction of a dominating set and those which are based solely on energy considerations. Energy optimized cluster formation for a set of randomly scattered wireless sensors is presented. Sensors within a cluster are expected to be communicating with cluster head only. The energy constraint and limited computing resources of the sensor nodes present the major challenges in gathering the data. In this paper we propose a framework to study how partially correlated data affect the performance of clustering algorithms. The total energy consumption and network lifetime can be analyzed by combining random geometry techniques and rate distortion theory. We also present the relation between compression distortion and data correlation.

Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Cumulative Learning based on Dynamic Clustering of Hierarchical Production Rules(HPRs)

An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.

Collaborative and Content-based Recommender System for Social Bookmarking Website

This study proposes a new recommender system based on the collaborative folksonomy. The purpose of the proposed system is to recommend Internet resources (such as books, articles, documents, pictures, audio and video) to users. The proposed method includes four steps: creating the user profile based on the tags, grouping the similar users into clusters using an agglomerative hierarchical clustering, finding similar resources based on the user-s past collections by using content-based filtering, and recommending similar items to the target user. This study examines the system-s performance for the dataset collected from “del.icio.us," which is a famous social bookmarking website. Experimental results show that the proposed tag-based collaborative and content-based filtering hybridized recommender system is promising and effectiveness in the folksonomy-based bookmarking website.