Abstract: Wireless Sensor Network (WSN) comprises of sensor
nodes which are designed to sense the environment, transmit sensed
data back to the base station via multi-hop routing to reconstruct
physical phenomena. Since physical phenomena exists significant
overlaps between temporal redundancy and spatial redundancy, it is
necessary to use Redundancy Suppression Algorithms (RSA) for sensor
node to lower energy consumption by reducing the transmission
of redundancy. A conventional algorithm of RSAs is threshold-based
RSA, which sets threshold to suppress redundant data. Although
many temporal and spatial RSAs are proposed, temporal-spatial RSA
are seldom to be proposed because it is difficult to determine when
to utilize temporal or spatial RSAs. In this paper, we proposed a
novel temporal-spatial redundancy suppression algorithm, Codebookbase
Redundancy Suppression Mechanism (CRSM). CRSM adopts
vector quantization to generate a codebook, which is easily used to
implement temporal-spatial RSA. CRSM not only achieves power
saving and reliability for WSN, but also provides the predictability
of network lifetime. Simulation result shows that the network lifetime
of CRSM outperforms at least 23% of that of other RSAs.
Abstract: This paper presents a comparative analysis of a new
unsupervised PCA-based technique for steel plates texture segmentation
towards defect detection. The proposed scheme called Variance
Based Component Analysis or VBCA employs PCA for feature
extraction, applies a feature reduction algorithm based on variance of
eigenpictures and classifies the pixels as defective and normal. While
the classic PCA uses a clusterer like Kmeans for pixel clustering,
VBCA employs thresholding and some post processing operations to
label pixels as defective and normal. The experimental results show
that proposed algorithm called VBCA is 12.46% more accurate and
78.85% faster than the classic PCA.
Abstract: Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.
Abstract: Segmentation in ultrasound images is challenging due to the interference from speckle noise and fuzziness of boundaries. In this paper, a segmentation scheme using fuzzy c-means (FCM) clustering incorporating both intensity and texture information of images is proposed to extract breast lesions in ultrasound images. Firstly, the nonlinear structure tensor, which can facilitate to refine the edges detected by intensity, is used to extract speckle texture. And then, a spatial FCM clustering is applied on the image feature space for segmentation. In the experiments with simulated and clinical ultrasound images, the spatial FCM clustering with both intensity and texture information gets more accurate results than the conventional FCM or spatial FCM without texture information.
Abstract: A clustering is process to identify a homogeneous
groups of object called as cluster. Clustering is one interesting topic
on data mining. A group or class behaves similarly characteristics.
This paper discusses a robust clustering process for data images with
two reduction dimension approaches; i.e. the two dimensional
principal component analysis (2DPCA) and principal component
analysis (PCA). A standard approach to overcome this problem is
dimension reduction, which transforms a high-dimensional data into
a lower-dimensional space with limited loss of information. One of
the most common forms of dimensionality reduction is the principal
components analysis (PCA). The 2DPCA is often called a variant of
principal component (PCA), the image matrices were directly treated
as 2D matrices; they do not need to be transformed into a vector so
that the covariance matrix of image can be constructed directly using
the original image matrices. The decomposed classical covariance
matrix is very sensitive to outlying observations. The objective of
paper is to compare the performance of robust minimizing vector
variance (MVV) in the two dimensional projection PCA (2DPCA)
and the PCA for clustering on an arbitrary data image when outliers
are hiden in the data set. The simulation aspects of robustness and
the illustration of clustering images are discussed in the end of
paper
Abstract: Need for an appropriate system of evaluating students-
educational developments is a key problem to achieve the predefined
educational goals. Intensity of the related papers in the last years; that
tries to proof or disproof the necessity and adequacy of the students
assessment; is the corroborator of this matter. Some of these studies
tried to increase the precision of determining question weights in
scientific examinations. But in all of them there has been an attempt
to adjust the initial question weights while the accuracy and precision
of those initial question weights are still under question. Thus In
order to increase the precision of the assessment process of students-
educational development, the present study tries to propose a new
method for determining the initial question weights by considering
the factors of questions like: difficulty, importance and complexity;
and implementing a combined method of PROMETHEE and fuzzy
analytic network process using a data mining approach to improve
the model-s inputs. The result of the implemented case study proves
the development of performance and precision of the proposed
model.
Abstract: For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Abstract: This paper focuses on the data-driven generation
of fuzzy IF...THEN rules. The resulted fuzzy rule base can be
applied to build a classifier, a model used for prediction, or
it can be applied to form a decision support system. Among
the wide range of possible approaches, the decision tree and
the association rule based algorithms are overviewed, and two
new approaches are presented based on the a priori fuzzy
clustering based partitioning of the continuous input variables.
An application study is also presented, where the developed
methods are tested on the well known Wisconsin Breast Cancer
classification problem.
Abstract: With the hardware technology advancing, the cost of
storing is decreasing. Thus there is an urgent need for new techniques
and tools that can intelligently and automatically assist us in
transferring this data into useful knowledge. Different techniques of
data mining are developed which are helpful for handling these large
size databases [7]. Data mining is also finding its role in the field of
biotechnology. Pedigree means the associated ancestry of a crop
variety. Genetic diversity is the variation in the genetic composition
of individuals within or among species. Genetic diversity depends
upon the pedigree information of the varieties. Parents at lower
hierarchic levels have more weightage for predicting genetic
diversity as compared to the upper hierarchic levels. The weightage
decreases as the level increases. For crossbreeding, the two varieties
should be more and more genetically diverse so as to incorporate the
useful characters of the two varieties in the newly developed variety.
This paper discusses the searching and analyzing of different possible
pairs of varieties selected on the basis of morphological characters,
Climatic conditions and Nutrients so as to obtain the most optimal
pair that can produce the required crossbreed variety. An algorithm
was developed to determine the genetic diversity between the
selected wheat varieties. Cluster analysis technique is used for
retrieving the results.
Abstract: In the Fe-3%Si sheets, grade Hi-B, with AlN and MnS
as inhibitors, the Goss grains which abnormally grow do not have a
size greater than the average size of the primary matrix. In this
heterogeneous microstructure, the size factor is not a required
condition for the secondary recrystallization. The onset of the small
Goss grain abnormal growth appears to be related to a particular
behavior of their grain boundaries, to the local texture and to the
distribution of the inhibitors. The presence and the evolution of
oriented clusters ensure to the small Goss grains a favorable
neighborhood to grow. The modified Monte-Carlo approach, which
is applied, considers the local environment of each grain. The grain
growth is dependent of its real spatial position; the matrix
heterogeneity is then taken into account. The grain growth conditions
are considered in the global matrix and in different matrixes
corresponding to A component clusters. The grain growth behaviour
is considered with introduction of energy only, energy and mobility,
energy and mobility and precipitates.
Abstract: System MEMORI automatically detects and recognizes
rotated and/or rescaled versions of the objects of a database within
digital color images with cluttered background. This task is accomplished
by means of a region grouping algorithm guided by heuristic
rules, whose parameters concern some geometrical properties and the
recognition score of the database objects. This paper focuses on the
strategies implemented in MEMORI for the estimation of the heuristic
rule parameters. This estimation, being automatic, makes the system
a self configuring and highly user-friendly tool.
Abstract: This paper presents a new growing neural network for
cluster analysis and market segmentation, which optimizes the size
and structure of clusters by iteratively checking them for multivariate
normality. We combine the recently published SGNN approach [8]
with the basic principle underlying the Gaussian-means algorithm
[13] and the Mardia test for multivariate normality [18, 19]. The new
approach distinguishes from existing ones by its holistic design and
its great autonomy regarding the clustering process as a whole. Its
performance is demonstrated by means of synthetic 2D data and by
real lifestyle survey data usable for market segmentation.
Abstract: We develop a three-step fuzzy logic-based algorithm for clustering categorical attributes, and we apply it to analyze cultural data. In the first step the algorithm employs an entropy-based clustering scheme, which initializes the cluster centers. In the second step we apply the fuzzy c-modes algorithm to obtain a fuzzy partition of the data set, and the third step introduces a novel cluster validity index, which decides the final number of clusters.
Abstract: In this paper three basic approaches and different
methods under each of them for extracting region of interest (ROI)
from stationary images are explored. The results obtained for each of
the proposed methods are shown, and it is demonstrated where each
method outperforms the other. Two main problems in ROI
extraction: the channel selection problem and the saliency reversal
problem are discussed and how best these two are addressed by
various methods is also seen. The basic approaches are 1) Saliency
based approach 2) Wavelet based approach 3) Clustering based
approach. The saliency approach performs well on images containing
objects of high saturation and brightness. The wavelet based
approach performs well on natural scene images that contain regions
of distinct textures. The mean shift clustering approach partitions the
image into regions according to the density distribution of pixel
intensities. The experimental results of various methodologies show
that each technique performs at different acceptable levels for
various types of images.
Abstract: Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.
Abstract: In this paper we focus on event extraction from Tamil
news article. This system utilizes a scoring scheme for extracting and
grouping event-specific sentences. Using this scoring scheme eventspecific
clustering is performed for multiple documents. Events are
extracted from each document using a scoring scheme based on
feature score and condition score. Similarly event specific sentences
are clustered from multiple documents using this scoring scheme.
The proposed system builds the Event Template based on user
specified query. The templates are filled with event specific details
like person, location and timeline extracted from the formed clusters.
The proposed system applies these methodologies for Tamil news
articles that have been enconverted into UNL graphs using a Tamil to
UNL-enconverter. The main intention of this work is to generate an
event based template.
Abstract: Liveable city is referred to as the quality of life in an
area that contributes towards a safe, healthy and enjoyable place. This
paper discusses the role of the streets- activities in making Kuala
Lumpur a liveable city and the happiness level of the residents
towards the city-s street activities. The study was conducted using the
residents of Kuala Lumpur. A mixed method technique is used with
the quantitative data as a main data and supported by the qualitative
data. Data were collected using questionnaires, observation and also
an interview session with a sample of residents of Kuala Lumpur.
The sampling technique is based on multistage cluster data sampling.
The findings revealed that, there is still no significant relationship
between the length of stay of the resident in Kuala Lumpur with the
happiness level towards the street activities that occurred in the city.
Abstract: The proliferation of user-generated content (UGC) results in huge opportunities to explore event patterns. However, existing event recommendation systems primarily focus on advanced information technology users. Little work has been done to address novice and low-literacy users. The next billion users providing and consuming UGC are likely to include communities from developing countries who are ready to use affordable technologies for subsistence goals. Therefore, we propose a design framework for providing event recommendations to address the needs of such users. Grounded in information integration theory (IIT), our framework advocates that effective event recommendation is supported by systems capable of (1) reliable information gathering through structured user input, (2) accurate sense making through spatial-temporal analytics, and (3) intuitive information dissemination through interactive visualization techniques. A mobile pest management application is developed as an instantiation of the design framework. Our preliminary study suggests a set of design principles for novice and low-literacy users.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: An on-demand routing protocol for wireless ad hoc
networks is one that searches for and attempts to discover a route to
some destination node only when a sending node originates a data
packet addressed to that node. In order to avoid the need for such a
route discovery to be performed before each data packet is sent, such
routing protocols must cache routes previously discovered. This
paper presents an analysis of the effect of intelligent caching in a non
clustered network, using on-demand routing protocols in wireless ad
hoc networks. The analysis carried out is based on the Dynamic
Source Routing protocol (DSR), which operates entirely on-demand.
DSR uses the cache in every node to save the paths that are learnt
during route discovery procedure. In this implementation, caching
these paths only at intermediate nodes and using the paths from these
caches when required is tried. This technique helps in storing more
number of routes that are learnt without erasing the entries in the
cache, to store a new route that is learnt.
The simulation results on DSR have shown that this technique
drastically increases the available memory for caching the routes
discovered without affecting the performance of the DSR routing
protocol in any way, except for a small increase in end to end delay.