Abstract: In blended learning environments, the Internet can be combined with other technologies. The aim of this research was to design, introduce and validate a model to support synchronous and asynchronous activities by managing content domains in an Adaptive Hypermedia System (AHS). The application is based on information recovery techniques, clustering algorithms and adaptation rules to adjust the user's model to contents and objects of study. This system was applied to blended learning in higher education. The research strategy used was the case study method. Empirical studies were carried out on courses at two universities to validate the model. The results of this research show that the model had a positive effect on the learning process. The students indicated that the synchronous and asynchronous scenario is a good option, as it involves a combination of work with the lecturer and the AHS. In addition, they gave positive ratings to the system and stated that the contents were adapted to each user profile.
Abstract: Data clustering is an important data exploration technique
with many applications in data mining. We present an enhanced
version of the well known single link clustering algorithm. We will
refer to this algorithm as DCBOR. The proposed algorithm alleviates
the chain effect by removing the outliers from the given dataset.
So this algorithm provides outlier detection and data clustering
simultaneously. This algorithm does not need to update the distance
matrix, since the algorithm depends on merging the most k-nearest
objects in one step and the cluster continues grow as long as possible
under specified condition. So the algorithm consists of two phases;
at the first phase, it removes the outliers from the input dataset. At
the second phase, it performs the clustering process. This algorithm
discovers clusters of different shapes, sizes, densities and requires
only one input parameter; this parameter represents a threshold for
outlier points. The value of the input parameter is ranging from 0 to
1. The algorithm supports the user in determining an appropriate
value for it. We have tested this algorithm on different datasets
contain outlier and connecting clusters by chain of density points,
and the algorithm discovers the correct clusters. The results of
our experiments demonstrate the effectiveness and the efficiency of
DCBOR.
Abstract: The System Identification problem looks for a
suitably parameterized model, representing a given process. The
parameters of the model are adjusted to optimize a performance
function based on error between the given process output and
identified process output. The linear system identification field is
well established with many classical approaches whereas most of
those methods cannot be applied for nonlinear systems. The problem
becomes tougher if the system is completely unknown with only the
output time series is available. It has been reported that the
capability of Artificial Neural Network to approximate all linear and
nonlinear input-output maps makes it predominantly suitable for the
identification of nonlinear systems, where only the output time series
is available. [1][2][4][5]. The work reported here is an attempt to
implement few of the well known algorithms in the context of
modeling of nonlinear systems, and to make a performance
comparison to establish the relative merits and demerits.
Abstract: Image coding based on clustering provides immediate
access to targeted features of interest in a high quality decoded
image. This approach is useful for intelligent devices, as well as for
multimedia content-based description standards. The result of image
clustering cannot be precise in some positions especially on pixels
with edge information which produce ambiguity among the clusters.
Even with a good enhancement operator based on PDE, the quality of
the decoded image will highly depend on the clustering process. In
this paper, we introduce an ambiguity cluster in image coding to
represent pixels with vagueness properties. The presence of such
cluster allows preserving some details inherent to edges as well for
uncertain pixels. It will also be very useful during the decoding phase
in which an anisotropic diffusion operator, such as Perona-Malik,
enhances the quality of the restored image. This work also offers a
comparative study to demonstrate the effectiveness of a fuzzy
clustering technique in detecting the ambiguity cluster without losing
lot of the essential image information. Several experiments have been
carried out to demonstrate the usefulness of ambiguity concept in
image compression. The coding results and the performance of the
proposed algorithms are discussed in terms of the peak signal-tonoise
ratio and the quantity of ambiguous pixels.
Abstract: By taking advantage of both k-NN which is highly
accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less
subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.
Abstract: This paper presents a highly efficient algorithm for detecting and tracking humans and objects in video surveillance sequences. Mean shift clustering is applied on backgrounddifferenced image sequences. For efficiency, all calculations are performed on integral images. Novel corresponding exponential integral kernels are introduced to allow the application of nonuniform kernels for clustering, which dramatically increases robustness without giving up the efficiency of the integral data structures. Experimental results demonstrating the power of this approach are presented.
Abstract: In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors.
Abstract: In this paper, Land Marks for Unique Addressing( LMUA) algorithm is develped to generate unique ID for each and every node which leads to the formation of overlapping/Non overlapping clusters based on unique ID. To overcome the draw back of the developed LMUA algorithm, the concept of clustering is introduced. Based on the clustering concept a Land Marks for Unique Addressing and Clustering(LMUAC) Algorithm is developed to construct strictly non-overlapping clusters and classify those nodes in to Cluster Heads, Member Nodes, Gate way nodes and generating the Hierarchical code for the cluster heads to operate in the level one hierarchy for wireless communication switching. The expansion of the existing network can be performed or not without modifying the cost of adding the clusterhead is shown. The developed algorithm shows one way of efficiently constructing the
Abstract: Biclustering aims at identifying several biclusters that
reveal potential local patterns from a microarray matrix. A bicluster is
a sub-matrix of the microarray consisting of only a subset of genes
co-regulates in a subset of conditions. In this study, we extend the
motif of subspace clustering to present a K-biclusters clustering (KBC)
algorithm for the microarray biclustering issue. Besides minimizing
the dissimilarities between genes and bicluster centers within all
biclusters, the objective function of the KBC algorithm additionally
takes into account how to minimize the residues within all biclusters
based on the mean square residue model. In addition, the objective
function also maximizes the entropy of conditions to stimulate more
conditions to contribute the identification of biclusters. The KBC
algorithm adopts the K-means type clustering process to efficiently
make the partition of K biclusters be optimized. A set of experiments
on a practical microarray dataset are demonstrated to show the
performance of the proposed KBC algorithm.
Abstract: Automatic segmentation of skin lesions is the first step
towards the automated analysis of malignant melanoma. Although
numerous segmentation methods have been developed, few studies
have focused on determining the most effective color space for
melanoma application. This paper proposes an automatic segmentation
algorithm based on color space analysis and clustering-based histogram
thresholding, a process which is able to determine the optimal
color channel for detecting the borders in dermoscopy images. The
algorithm is tested on a set of 30 high resolution dermoscopy images.
A comprehensive evaluation of the results is provided, where borders
manually drawn by four dermatologists, are compared to automated
borders detected by the proposed algorithm, applying three previously
used metrics of accuracy, sensitivity, and specificity and a new metric
of similarity. By performing ROC analysis and ranking the metrics,
it is demonstrated that the best results are obtained with the X and
XoYoR color channels, resulting in an accuracy of approximately
97%. The proposed method is also compared with two state-of-theart
skin lesion segmentation methods.
Abstract: Data mining uses a variety of techniques each of which
is useful for some particular task. It is important to have a deep
understanding of each technique and be able to perform sophisticated
analysis. In this article we describe a tool built to simulate a variation
of the Kohonen network to perform unsupervised clustering and
support the entire data mining process up to results visualization. A
graphical representation helps the user to find out a strategy to
optimize classification by adding, moving or delete a neuron in order
to change the number of classes. The tool is able to automatically
suggest a strategy to optimize the number of classes optimization, but
also support both tree classifications and semi-lattice organizations of
the classes to give to the users the possibility of passing from one
class to the ones with which it has some aspects in common.
Examples of using tree and semi-lattice classifications are given to
illustrate advantages and problems. The tool is applied to classify
macroeconomic data that report the most developed countries- import
and export. It is possible to classify the countries based on their
economic behaviour and use the tool to characterize the commercial
behaviour of a country in a selected class from the analysis of
positive and negative features that contribute to classes formation.
Possible interrelationships between the classes and their meaning are
also discussed.
Abstract: This paper introduces a measure of similarity between
two clusterings of the same dataset produced by two different
algorithms, or even the same algorithm (K-means, for instance, with
different initializations usually produce different results in clustering
the same dataset). We then apply the measure to calculate the
similarity between pairs of clusterings, with special interest directed
at comparing the similarity between various machine clusterings and
human clustering of datasets. The similarity measure thus can be used
to identify the best (in terms of most similar to human) clustering
algorithm for a specific problem at hand. Experimental results
pertaining to the text categorization problem of a Portuguese corpus
(wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The
significance and other potential applications of the proposed measure
are discussed.
Abstract: Currently, one of the main directions is developing of
development based on the clustering of economic operations of
Kazakhstan, providing for the organization and concentration of
production capacity in one region or the most optimal system. In the
modern economic literature clustering is regarded as one of the most
effective tools to ensure competitive businesses, and improve their
business itself.
Abstract: Wavelet neural networks (WNNs) have emerged as a vital alternative to the vastly studied multilayer perceptrons (MLPs) since its first implementation. In this paper, we applied various clustering algorithms, namely, K-means (KM), Fuzzy C-means (FCM), symmetry-based K-means (SBKM), symmetry-based Fuzzy C-means (SBFCM) and modified point symmetry-based K-means (MPKM) clustering algorithms in choosing the translation parameter of a WNN. These modified WNNs are further applied to the heterogeneous cancer classification using benchmark microarray data and were compared against the conventional WNN with random initialization method. Experimental results showed that a WNN classifier with the MPKM algorithm is more precise than the conventional WNN as well as the WNNs with other clustering algorithms.
Abstract: Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In the light of the clustering data, we have verified some interactions which were not identified as core interactions in DIP and also, we have characterized some functionally unknown proteins according to the interaction data and functional correlation. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins, also to predict new interactions and to characterize functions of unknown proteins.
Abstract: Artificial Bee Colony (ABC) algorithm is a relatively new swarm intelligence technique for clustering. It produces higher
quality clusters compared to other population-based algorithms but with poor energy efficiency, cluster quality consistency and typically slower in convergence speed. Inspired by energy saving foraging behavior of natural honey bees this paper presents a Quality and Quantity Aware Artificial Bee Colony (Q2ABC) algorithm to improve quality of cluster identification, energy efficiency and convergence speed of the original ABC. To evaluate the performance of Q2ABC algorithm, experiments were conducted on a suite of ten benchmark UCI datasets. The results demonstrate Q2ABC outperformed ABC and K-means algorithm in the quality of clusters delivered.
Abstract: Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.
Abstract: This paper describes a probabilistic method for
three-dimensional object recognition using a shared pool of surface
signatures. This technique uses flatness, orientation, and convexity
signatures that encode the surface of a free-form object into three
discriminative vectors, and then creates a shared pool of data by
clustering the signatures using a distance function. This method
applies the Bayes-s rule for recognition process, and it is extensible
to a large collection of three-dimensional objects.
Abstract: In this paper, a model for an information retrieval
system is proposed which takes into account that knowledge about
documents and information need of users are dynamic. Two
methods are combined, one qualitative or symbolic and the other
quantitative or numeric, which are deemed suitable for many
clustering contexts, data analysis, concept exploring and
knowledge discovery. These two methods may be classified as
inductive learning techniques. In this model, they are introduced to
build “long term" knowledge about past queries and concepts in a
collection of documents. The “long term" knowledge can guide
and assist the user to formulate an initial query and can be
exploited in the process of retrieving relevant information. The
different kinds of knowledge are organized in different points of
view. This may be considered an enrichment of the exploration
level which is coherent with the concept of document/query
structure.
Abstract: This paper focuses on reducing the power consumption
of wireless sensor networks. Therefore, a communication protocol
named LEACH (Low-Energy Adaptive Clustering Hierarchy) is modified.
We extend LEACHs stochastic cluster-head selection algorithm
by a modifying the probability of each node to become cluster-head
based on its required energy to transmit to the sink. We present
an efficient energy aware routing algorithm for the wireless sensor
networks. Our contribution consists in rotation selection of clusterheads
considering the remoteness of the nodes to the sink, and then,
the network nodes residual energy. This choice allows a best distribution
of the transmission energy in the network. The cluster-heads
selection algorithm is completely decentralized. Simulation results
show that the energy is significantly reduced compared with the
previous clustering based routing algorithm for the sensor networks.