Advanced Information Extraction with n-gram based LSI

Number of documents being created increases at an increasing pace while most of them being in already known topics and little of them introducing new concepts. This fact has started a new era in information retrieval discipline where the requirements have their own specialties. That is digging into topics and concepts and finding out subtopics or relations between topics. Up to now IR researches were interested in retrieving documents about a general topic or clustering documents under generic subjects. However these conventional approaches can-t go deep into content of documents which makes it difficult for people to reach to right documents they were searching. So we need new ways of mining document sets where the critic point is to know much about the contents of the documents. As a solution we are proposing to enhance LSI, one of the proven IR techniques by supporting its vector space with n-gram forms of words. Positive results we have obtained are shown in two different application area of IR domain; querying a document database, clustering documents in the document database.

The Leaves of a Tree

In this article, models based on quantitative analysis, physical geometry and regression analysis are established, by using analytic hierarchy process analysis, fuzzy cluster analysis, fuzzy photographic and data fitting. The reasons of various leaf shapes among different species and the differences between the leaf shapes on same tree have been solved by using software, such as Eviews, VB and Matlab. We also successfully estimate the leaf mass of a tree and the correlation with the tree profile.

Application of a New Hybrid Optimization Algorithm on Cluster Analysis

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Categorical Clustering By Converting Associated Information

Lacking an inherent “natural" dissimilarity measure between objects in categorical dataset presents special difficulties in clustering analysis. However, each categorical attributes from a given dataset provides natural probability and information in the sense of Shannon. In this paper, we proposed a novel method which heuristically converts categorical attributes to numerical values by exploiting such associated information. We conduct an experimental study with real-life categorical dataset. The experiment demonstrates the effectiveness of our approach.

Epidemiology of Waterborne Diarrhoeal Diseases among Children Aged 6-36 Months Old in Busia - Western Kenya

The purpose of the present study was to evaluate the epidemiology of waterborne diarrhoeal among children aged 6-36 months old in Busia town, western Kenya. The study was carried out between Feb. 2008 and Feb. 2010. Cases of diarrhoea reported in 385 households were linked to household water handling practices. A mother with a child of 6-36 months old was also included in the study. Diarrhoea prevalence among children 6-36 months was 16.7% in Busia town, Bwamani (19.6%) and Mayenje (10.6%) clustered in Mayenje sub-location reported the highest and the lowest prevalence of diarrhoea. There was a positive correlation between the prevalence of diarrhoea in children and the level of the mother-s education, 29.9% (n= 100). Diarrhoea cases decreased in range from 35.5% (n =102) to 4.8% (n= 16), corresponding to increase in age from 6-35 months on average. In conclusion, prevalence of diarrhoea in children of 6-36 months old was 16.7% in Busia town. This was higher in children whose mother-s age was below 18 years and with low level of education, the rate decreased with increase in age of children. Prevalence of diarrhoea in children aged 6-36months in households was higher in children aged 6-17 and 36 months and whose mothers were less educated and fell between the ages of 18-24 years. The Influence of human activities at the main source of drinking water on the prevalence of diarrhoea in these children was insignificant.

Design and Implementation a New Energy Efficient Clustering Algorithm using Genetic Algorithm for Wireless Sensor Networks

Wireless Sensor Networks consist of small battery powered devices with limited energy resources. once deployed, the small sensor nodes are usually inaccessible to the user, and thus replacement of the energy source is not feasible. Hence, One of the most important issues that needs to be enhanced in order to improve the life span of the network is energy efficiency. to overcome this demerit many research have been done. The clustering is the one of the representative approaches. in the clustering, the cluster heads gather data from nodes and sending them to the base station. In this paper, we introduce a dynamic clustering algorithm using genetic algorithm. This algorithm takes different parameters into consideration to increase the network lifetime. To prove efficiency of proposed algorithm, we simulated the proposed algorithm compared with LEACH algorithm using the matlab

On the Perfomance of Multiband OFDM under Log-normal Channel Fading

A modified Saleh-Valenzuela channel model has been adapted for Ultra Wideband (UWB) system. The suggested realistic channel is assessed by its distribution of fading amplitude and time of arrivals. Furthermore, the propagation characteristic has been distinct into four channel models, namely CM 1 to 4. Each are differentiate in terms of cluster arrival rates, rays arrival rate within each cluster and its respective constant decay rates. This paper described the multiband OFDM system performance simulates under these multipath conditions. Simulation work described in this paper is based on WiMedia ECMA-368 standard, which has been deployed for practical implementation of low cost and low power UWB devices.

Increasing Lifetime of Target Tracking Wireless Sensor Networks

A model to identify the lifetime of target tracking wireless sensor network is proposed. The model is a static clusterbased architecture and aims to provide two factors. First, it is to increase the lifetime of target tracking wireless sensor network. Secondly, it is to enable good localization result with low energy consumption for each sensor in the network. The model consists of heterogeneous sensors and each sensing member node in a cluster uses two operation modes–active mode and sleep mode. The performance results illustrate that the proposed architecture consumes less energy and increases lifetime than centralized and dynamic clustering architectures, for target tracking sensor network.

A Selective Markovianity Approach for Image Segmentation

A new Markovianity approach is introduced in this paper. This approach reduces the response time of classic Markov Random Fields approach. First, one region is determinated by a clustering technique. Then, this region is excluded from the study. The remaining pixel form the study zone and they are selected for a Markovianity segmentation task. With Selective Markovianity approach, segmentation process is faster than classic one.

Towards Clustering of Web-based Document Structures

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Clustering Protein Sequences with Tailored General Regression Model Technique

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.

Memory Leak Detection in Distributed System

Due to memory leaks, often-valuable system memory gets wasted and denied for other processes thereby affecting the computational performance. If an application-s memory usage exceeds virtual memory size, it can leads to system crash. Current memory leak detection techniques for clusters are reactive and display the memory leak information after the execution of the process (they detect memory leak only after it occur). This paper presents a Dynamic Memory Monitoring Agent (DMMA) technique. DMMA framework is a dynamic memory leak detection, that detects the memory leak while application is in execution phase, when memory leak in any process in the cluster is identified by DMMA it gives information to the end users to enable them to take corrective actions and also DMMA submit the affected process to healthy node in the system. Thus provides reliable service to the user. DMMA maintains information about memory consumption of executing processes and based on this information and critical states, DMMA can improve reliability and efficaciousness of cluster computing.

Using Morphological and Microsatellite (SSR) Markers to Assess the Genetic Diversity in Alfalfa (Medicago sativa L.)

Utilization of diverse germplasm is needed to enhance the genetic diversity of cultivars. The objective of this study was to evaluate the genetic relationships of 98 alfalfa germplasm accessions using morphological traits and SSR markers. From the 98 tested populations, 81 were locals originating in Europe, 17 were introduced from USA, Australia, New Zealand and Canada. Three primers generated 67 polymorphic bands. The average polymorphic information content (PIC) was very high (> 0.90) over all three used primer combinations. Cluster analysis using Unweighted Pair Group Method with Arithmetic Means (UPGMA) and Jaccard´s coefficient grouped the accessions into 2 major clusters with 4 sub-clusters with no correlation between genetic and morphological diversity. The SSR analysis clearly indicated that even with three polymorphic primers, reliable estimation of genetic diversity could be obtained.

A Study on Early Prediction of Fault Proneness in Software Modules using Genetic Algorithm

Fault-proneness of a software module is the probability that the module contains faults. To predict faultproneness of modules different techniques have been proposed which includes statistical methods, machine learning techniques, neural network techniques and clustering techniques. The aim of proposed study is to explore whether metrics available in the early lifecycle (i.e. requirement metrics), metrics available in the late lifecycle (i.e. code metrics) and metrics available in the early lifecycle (i.e. requirement metrics) combined with metrics available in the late lifecycle (i.e. code metrics) can be used to identify fault prone modules using Genetic Algorithm technique. This approach has been tested with real time defect C Programming language datasets of NASA software projects. The results show that the fusion of requirement and code metric is the best prediction model for detecting the faults as compared with commonly used code based model.

Info-participation of the Disabled Using the Mixed Preference Data in Improving Their Travel Quality

Today, the preferences and participation of the TD groups such as the elderly and disabled is still lacking in decision-making of transportation planning, and their reactions to certain type of policies are not well known. Thus, a clear methodology is needed. This study aimed to develop a method to extract the preferences of the disabled to be used in the policy-making stage that can also guide to future estimations. The method utilizes the combination of cluster analysis and data filtering using the data of the Arao city (Japan). The method is a process that follows: defining the TD group by the cluster analysis tool, their travel preferences in tabular form from the household surveys by policy variableimpact pairs, zones, and by trip purposes, and the final outcome is the preference probabilities of the disabled. The preferences vary by trip purpose; for the work trips, accessibility and transit system quality policies with the accompanying impacts of modal shifts towards public mode use as well as the decreasing travel costs, and the trip rate increase; for the social trips, the same accessibility and transit system policies leading to the same mode shift impact, together with the travel quality policy area leading to trip rate increase. These results explain the policies to focus and can be used in scenario generation in models, or any other planning purpose as decision support tool.

Water Quality and Freshwater Fish Diversity at Khao Luang National Park, Thailand

Water quality and freshwater fish diversity from nine waterfalls at Khao Luang National Park, Thailand was examined. Streams were shallow, fast flowing with clear water and rocky and sandy substrate. The mean water quality of waterfalls at Khao Luang National Park were as following pH 7.50, air temperature 24.27 °C, water temperature 26.37 °C, dissolved oxygen 7.88 mg/l, hardness 4.44-21.33 mg/l, alkalinity 3.55-11.88 mg/(as CaCO3). Twenty fish species were found at Khao Luang National Park belonging to nine families. A cluster analysis of water quality at Khao Luang National Park revealed that waterfalls at Khao Luang National Park were divided into two groups: A and B. Group A composed of two waterfalls (i.e. Aie Kaew and Wangmaipak) that flew to the Gulf of Thailand side. Group B composed of seven waterfalls (i.e. Promlok, Kalom, Nuafa, Suankun, Soidaw, Suanhai, and Thapae) that flew to the Andaman Sea side (Fig. 2) .The Cyprinids represented the major species in all the waterfalls comprising of 45%.

Smart Sustainable Cities: An Integrated Planning Approach towards Sustainable Urban Energy Systems, India

Cities denote instantaneously a challenge and an opportunity for climate change policy. Cities are the place where most energy services are needed because urbanization is closely linked to high population densities and concentration of economic activities and production (Urban energy demand). Consequently, it is critical to explain about the role of cities within the world-s energy systems and its correlation with the climate change issue. With more than half of the world-s population already living in urban areas, and that percentage expected to rise to 75 per cent by 2050, it is clear that the path to sustainable development must pass through cities. Cities expanding in size and population pose increased challenges to the environment, of which energy is part as a natural resource, and to the quality of life. Nowadays, most cities have already understood the importance of sustainability, both at their local scale as in terms of their contribution to sustainability at higher geographical scales. It requires the perception of a city as a complex and dynamic ecosystem, an open system, or cluster of systems, where the energy as well as the other natural resources is transformed to satisfy the needs of the different urban activities. In fact, buildings and transportation generally represent most of cities direct energy demand, i.e., between 60 per cent and 80 per cent of the overall consumption. Buildings, both residential and services are usually influenced by the local physical and social conditions. In terms of transport, the energy demand is also strongly linked with the specific characteristics of a city (urban mobility).The concept of a “smart city" builds on statistics as seven key axes of a city-s success in moving towards common platform (brain nerve)of sustainable urban energy systems. With the aforesaid knowledge, the authors have suggested a frame work to role of cities, as energy actors for smart city management. The authors have discusses the potential elements needed for energy in smart cities and also identified potential energy actions and relevant barriers. Furthermore, three levels of city smartness in cities actions to overcome market /institutional failures with a local approach are distinguished. The authors have made an attempt to conceive and implement concepts of city smartness by adopting the city or local government as nerve center through an integrated planning approach. Finally, concluding with recommendations for the organization of the Smart Sustainable Cities for positive changes of urban India.

Cross-Cultural Socio-Economic Status Attainment between Muslim and Santal Couple in Rural Bangladesh

This study compared socio-economic status attainment between the Muslim and Santal couples in rural Bangladesh. For this we hypothesized that socio-economic status attainment (occupation, education and income) of the Muslim couples was higher than the Santal ones in rural Bangladesh. In order to examine the hypothesis 288 couples (145 couples for Muslim and 143 couples for Santal) selected by cluster random sampling from Kalna village, Bangladesh were individually interviewed with semistructured questionnaire method. The results of Pearson Chi-Squire test suggest that there were significant differences in socio-economic status attainment between the two communities- couples. In addition, Pearson correlation coefficients also suggest that there were significant associations between the socio-economic statuses attained by the two communities- couples in rural Bangladesh. Further crosscultural study should conduct on how inter-community relations in rural social structure of Bangladesh influence the differences among the couples- socio-economic status attainment

Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran

The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.

Prediction of Reusability of Object Oriented Software Systems using Clustering Approach

In literature, there are metrics for identifying the quality of reusable components but the framework that makes use of these metrics to precisely predict reusability of software components is still need to be worked out. These reusability metrics if identified in the design phase or even in the coding phase can help us to reduce the rework by improving quality of reuse of the software component and hence improve the productivity due to probabilistic increase in the reuse level. As CK metric suit is most widely used metrics for extraction of structural features of an object oriented (OO) software; So, in this study, tuned CK metric suit i.e. WMC, DIT, NOC, CBO and LCOM, is used to obtain the structural analysis of OO-based software components. An algorithm has been proposed in which the inputs can be given to K-Means Clustering system in form of tuned values of the OO software component and decision tree is formed for the 10-fold cross validation of data to evaluate the in terms of linguistic reusability value of the component. The developed reusability model has produced high precision results as desired.