Abstract: Number of documents being created increases at an
increasing pace while most of them being in already known topics
and little of them introducing new concepts. This fact has started a
new era in information retrieval discipline where the requirements
have their own specialties. That is digging into topics and concepts
and finding out subtopics or relations between topics. Up to now IR
researches were interested in retrieving documents about a general
topic or clustering documents under generic subjects. However these
conventional approaches can-t go deep into content of documents
which makes it difficult for people to reach to right documents they
were searching. So we need new ways of mining document sets
where the critic point is to know much about the contents of the
documents. As a solution we are proposing to enhance LSI, one of
the proven IR techniques by supporting its vector space with n-gram
forms of words. Positive results we have obtained are shown in two
different application area of IR domain; querying a document
database, clustering documents in the document database.
Abstract: In this article, models based on quantitative analysis,
physical geometry and regression analysis are established, by using
analytic hierarchy process analysis, fuzzy cluster analysis, fuzzy
photographic and data fitting. The reasons of various leaf shapes
among different species and the differences between the leaf shapes on
same tree have been solved by using software, such as Eviews, VB and
Matlab. We also successfully estimate the leaf mass of a tree and the
correlation with the tree profile.
Abstract: Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.
Abstract: Lacking an inherent “natural" dissimilarity measure
between objects in categorical dataset presents special difficulties in
clustering analysis. However, each categorical attributes from a given
dataset provides natural probability and information in the sense of
Shannon. In this paper, we proposed a novel method which
heuristically converts categorical attributes to numerical values by
exploiting such associated information. We conduct an experimental
study with real-life categorical dataset. The experiment demonstrates
the effectiveness of our approach.
Abstract: The purpose of the present study was to evaluate the
epidemiology of waterborne diarrhoeal among children aged 6-36
months old in Busia town, western Kenya. The study was carried out
between Feb. 2008 and Feb. 2010. Cases of diarrhoea reported in 385
households were linked to household water handling practices. A
mother with a child of 6-36 months old was also included in the
study. Diarrhoea prevalence among children 6-36 months was 16.7%
in Busia town, Bwamani (19.6%) and Mayenje (10.6%) clustered in
Mayenje sub-location reported the highest and the lowest prevalence
of diarrhoea. There was a positive correlation between the prevalence
of diarrhoea in children and the level of the mother-s education,
29.9% (n= 100). Diarrhoea cases decreased in range from 35.5% (n
=102) to 4.8% (n= 16), corresponding to increase in age from 6-35
months on average. In conclusion, prevalence of diarrhoea in
children of 6-36 months old was 16.7% in Busia town. This was
higher in children whose mother-s age was below 18 years and with
low level of education, the rate decreased with increase in age of
children. Prevalence of diarrhoea in children aged 6-36months in
households was higher in children aged 6-17 and 36 months and
whose mothers were less educated and fell between the ages of 18-24
years. The Influence of human activities at the main source of
drinking water on the prevalence of diarrhoea in these children was
insignificant.
Abstract: Wireless Sensor Networks consist of small battery
powered devices with limited energy resources. once deployed, the
small sensor nodes are usually inaccessible to the user, and thus
replacement of the energy source is not feasible. Hence, One of the
most important issues that needs to be enhanced in order to improve
the life span of the network is energy efficiency. to overcome this
demerit many research have been done. The clustering is the one of
the representative approaches. in the clustering, the cluster heads
gather data from nodes and sending them to the base station. In this
paper, we introduce a dynamic clustering algorithm using genetic
algorithm. This algorithm takes different parameters into
consideration to increase the network lifetime. To prove efficiency of
proposed algorithm, we simulated the proposed algorithm compared
with LEACH algorithm using the matlab
Abstract: A modified Saleh-Valenzuela channel model has been
adapted for Ultra Wideband (UWB) system. The suggested realistic
channel is assessed by its distribution of fading amplitude and time of
arrivals. Furthermore, the propagation characteristic has been distinct
into four channel models, namely CM 1 to 4. Each are differentiate in
terms of cluster arrival rates, rays arrival rate within each cluster and
its respective constant decay rates. This paper described the
multiband OFDM system performance simulates under these
multipath conditions. Simulation work described in this paper is
based on WiMedia ECMA-368 standard, which has been deployed
for practical implementation of low cost and low power UWB
devices.
Abstract: A model to identify the lifetime of target tracking
wireless sensor network is proposed. The model is a static clusterbased
architecture and aims to provide two factors. First, it is to
increase the lifetime of target tracking wireless sensor network.
Secondly, it is to enable good localization result with low energy
consumption for each sensor in the network. The model consists of
heterogeneous sensors and each sensing member node in a cluster
uses two operation modes–active mode and sleep mode. The
performance results illustrate that the proposed architecture consumes
less energy and increases lifetime than centralized and dynamic
clustering architectures, for target tracking sensor network.
Abstract: A new Markovianity approach is introduced in this
paper. This approach reduces the response time of classic Markov
Random Fields approach. First, one region is determinated by a
clustering technique. Then, this region is excluded from the study.
The remaining pixel form the study zone and they are selected for a
Markovianity segmentation task. With Selective Markovianity
approach, segmentation process is faster than classic one.
Abstract: Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
Abstract: Cluster analysis divides data into groups that are
meaningful, useful, or both. Analysis of biological data is creating a
new generation of epidemiologic, prognostic, diagnostic and
treatment modalities. Clustering of protein sequences is one of the
current research topics in the field of computer science. Linear
relation is valuable in rule discovery for a given data, such as if value
X goes up 1, value Y will go down 3", etc. The classical linear
regression models the linear relation of two sequences perfectly.
However, if we need to cluster a large repository of protein sequences
into groups where sequences have strong linear relationship with
each other, it is prohibitively expensive to compare sequences one by
one. In this paper, we propose a new technique named General
Regression Model Technique Clustering Algorithm (GRMTCA) to
benignly handle the problem of linear sequences clustering. GRMT
gives a measure, GR*, to tell the degree of linearity of multiple
sequences without having to compare each pair of them.
Abstract: Due to memory leaks, often-valuable system memory
gets wasted and denied for other processes thereby affecting the
computational performance. If an application-s memory usage
exceeds virtual memory size, it can leads to system crash. Current
memory leak detection techniques for clusters are reactive and
display the memory leak information after the execution of the
process (they detect memory leak only after it occur).
This paper presents a Dynamic Memory Monitoring Agent
(DMMA) technique. DMMA framework is a dynamic memory leak
detection, that detects the memory leak while application is in
execution phase, when memory leak in any process in the cluster is
identified by DMMA it gives information to the end users to enable
them to take corrective actions and also DMMA submit the affected
process to healthy node in the system. Thus provides reliable service
to the user. DMMA maintains information about memory
consumption of executing processes and based on this information
and critical states, DMMA can improve reliability and
efficaciousness of cluster computing.
Abstract: Utilization of diverse germplasm is needed to enhance
the genetic diversity of cultivars. The objective of this study was to
evaluate the genetic relationships of 98 alfalfa germplasm accessions
using morphological traits and SSR markers. From the 98 tested
populations, 81 were locals originating in Europe, 17 were introduced
from USA, Australia, New Zealand and Canada. Three primers
generated 67 polymorphic bands. The average polymorphic
information content (PIC) was very high (> 0.90) over all three used
primer combinations. Cluster analysis using Unweighted Pair Group
Method with Arithmetic Means (UPGMA) and Jaccard´s coefficient
grouped the accessions into 2 major clusters with 4 sub-clusters with
no correlation between genetic and morphological diversity. The SSR
analysis clearly indicated that even with three polymorphic primers,
reliable estimation of genetic diversity could be obtained.
Abstract: Fault-proneness of a software module is the
probability that the module contains faults. To predict faultproneness
of modules different techniques have been proposed which
includes statistical methods, machine learning techniques, neural
network techniques and clustering techniques. The aim of proposed
study is to explore whether metrics available in the early lifecycle
(i.e. requirement metrics), metrics available in the late lifecycle (i.e.
code metrics) and metrics available in the early lifecycle (i.e.
requirement metrics) combined with metrics available in the late
lifecycle (i.e. code metrics) can be used to identify fault prone
modules using Genetic Algorithm technique. This approach has been
tested with real time defect C Programming language datasets of
NASA software projects. The results show that the fusion of
requirement and code metric is the best prediction model for
detecting the faults as compared with commonly used code based
model.
Abstract: Today, the preferences and participation of the TD groups such as the elderly and disabled is still lacking in decision-making of transportation planning, and their reactions to certain type of policies are not well known. Thus, a clear methodology is needed. This study aimed to develop a method to extract the preferences of the disabled to be used in the policy-making stage that can also guide to future estimations. The method utilizes the combination of cluster analysis and data filtering using the data of the Arao city (Japan). The method is a process that follows: defining the TD group by the cluster analysis tool, their travel preferences in tabular form from the household surveys by policy variableimpact pairs, zones, and by trip purposes, and the final outcome is the preference probabilities of the disabled. The preferences vary by trip purpose; for the work trips, accessibility and transit system quality policies with the accompanying impacts of modal shifts towards public mode use as well as the decreasing travel costs, and the trip rate increase; for the social trips, the same accessibility and transit system policies leading to the same mode shift impact, together with the travel quality policy area leading to trip rate increase. These results explain the policies to focus and can be used in scenario generation in models, or any other planning purpose as decision support tool.
Abstract: Water quality and freshwater fish diversity from nine
waterfalls at Khao Luang National Park, Thailand was examined.
Streams were shallow, fast flowing with clear water and rocky and
sandy substrate. The mean water quality of waterfalls at Khao Luang
National Park were as following pH 7.50, air temperature 24.27 °C,
water temperature 26.37 °C, dissolved oxygen 7.88 mg/l, hardness
4.44-21.33 mg/l, alkalinity 3.55-11.88 mg/(as CaCO3). Twenty fish
species were found at Khao Luang National Park belonging to nine
families. A cluster analysis of water quality at Khao Luang National
Park revealed that waterfalls at Khao Luang National Park were
divided into two groups: A and B. Group A composed of two
waterfalls (i.e. Aie Kaew and Wangmaipak) that flew to the Gulf of
Thailand side. Group B composed of seven waterfalls (i.e. Promlok,
Kalom, Nuafa, Suankun, Soidaw, Suanhai, and Thapae) that flew to
the Andaman Sea side (Fig. 2) .The Cyprinids represented the major
species in all the waterfalls comprising of 45%.
Abstract: Cities denote instantaneously a challenge and an
opportunity for climate change policy. Cities are the place where
most energy services are needed because urbanization is closely
linked to high population densities and concentration of economic
activities and production (Urban energy demand). Consequently, it is
critical to explain about the role of cities within the world-s energy
systems and its correlation with the climate change issue. With more
than half of the world-s population already living in urban areas, and
that percentage expected to rise to 75 per cent by 2050, it is clear that
the path to sustainable development must pass through cities. Cities
expanding in size and population pose increased challenges to the
environment, of which energy is part as a natural resource, and to the
quality of life. Nowadays, most cities have already understood the
importance of sustainability, both at their local scale as in terms of
their contribution to sustainability at higher geographical scales. It
requires the perception of a city as a complex and dynamic
ecosystem, an open system, or cluster of systems, where the energy
as well as the other natural resources is transformed to satisfy the
needs of the different urban activities. In fact, buildings and
transportation generally represent most of cities direct energy
demand, i.e., between 60 per cent and 80 per cent of the overall
consumption. Buildings, both residential and services are usually
influenced by the local physical and social conditions. In terms of
transport, the energy demand is also strongly linked with the specific
characteristics of a city (urban mobility).The concept of a “smart
city" builds on statistics as seven key axes of a city-s success in
moving towards common platform (brain nerve)of sustainable urban
energy systems.
With the aforesaid knowledge, the authors have suggested a frame
work to role of cities, as energy actors for smart city management.
The authors have discusses the potential elements needed for energy
in smart cities and also identified potential energy actions and
relevant barriers. Furthermore, three levels of city smartness in cities
actions to overcome market /institutional failures with a local
approach are distinguished. The authors have made an attempt to
conceive and implement concepts of city smartness by adopting the
city or local government as nerve center through an integrated
planning approach. Finally, concluding with recommendations for
the organization of the Smart Sustainable Cities for positive changes
of urban India.
Abstract: This study compared socio-economic status attainment between the Muslim and Santal couples in rural Bangladesh. For this we hypothesized that socio-economic status attainment (occupation, education and income) of the Muslim couples was higher than the Santal ones in rural Bangladesh. In order to examine the hypothesis 288 couples (145 couples for Muslim and 143 couples for Santal) selected by cluster random sampling from Kalna village, Bangladesh were individually interviewed with semistructured questionnaire method. The results of Pearson Chi-Squire test suggest that there were significant differences in socio-economic status attainment between the two communities- couples. In addition, Pearson correlation coefficients also suggest that there were significant associations between the socio-economic statuses attained by the two communities- couples in rural Bangladesh. Further crosscultural study should conduct on how inter-community relations in rural social structure of Bangladesh influence the differences among the couples- socio-economic status attainment
Abstract: The goal of this paper is to segment the countries
based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and
use this distance function in K-means algorithm. The DEB function
is defined based on the concepts of the association rules and the
value of export group-commodities. In this paper, clustering quality
function and clusters intraclass inertia are defined to, respectively,
calculate the optimum number of clusters and to compare the
functionality of DEB versus Euclidean distance. We have also study
the effects of importance weight in DEB function to improve
clustering quality. Lastly when segmentation is completed, a
designated RFM model is used to analyze the relative profitability of
each cluster.
Abstract: In literature, there are metrics for identifying the
quality of reusable components but the framework that makes use of
these metrics to precisely predict reusability of software components
is still need to be worked out. These reusability metrics if identified
in the design phase or even in the coding phase can help us to reduce
the rework by improving quality of reuse of the software component
and hence improve the productivity due to probabilistic increase in
the reuse level. As CK metric suit is most widely used metrics for
extraction of structural features of an object oriented (OO) software;
So, in this study, tuned CK metric suit i.e. WMC, DIT, NOC, CBO
and LCOM, is used to obtain the structural analysis of OO-based
software components. An algorithm has been proposed in which the
inputs can be given to K-Means Clustering system in form of
tuned values of the OO software component and decision tree is
formed for the 10-fold cross validation of data to evaluate the in
terms of linguistic reusability value of the component. The developed
reusability model has produced high precision results as desired.