Abstract: Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.
Abstract: Isolated and confined environments (ICE) present several challenges which may adversely affect human’s psychology and physiology. Submariners in Sub-Surface Ballistic Nuclear (SSBN) mission exposed to these environmental constraints must be able to perform complex tasks as part of their normal duties, as well as during crisis periods when emergency actions are required or imminent. The operational and environmental constraints they face contribute to challenge human adaptability. The impact of such a constrained environment has yet to be explored. Establishing a knowledge framework is a determining factor, particularly in view of the next long space travels. Ensuring that the crews are maintained in optimal operational conditions is a real challenge because the success of the mission depends on them. This study focused on the evaluation of the impact of stress on mental health and sensory degradation of submariners during a mission on SSBN using cardiac biosignal (heart rate variability, HRV) clustering. This is a pragmatic exploratory study of a prospective cohort included 19 submariner volunteers. HRV was recorded at baseline to classify by clustering the submariners according to their stress level based on parasympathetic (Pa) activity. Impacts of high Pa (HPa) versus low Pa (LPa) level at baseline were assessed on emotional state and sensory perception (interoception and exteroception) as a cardiac biosignal during the patrol and at a recovery time one month after. Whatever the time, no significant difference was found in mental health between groups. There are significant differences in the interoceptive, exteroceptive and physiological functioning during the patrol and at recovery time. To sum up, compared to the LPa group, the HPa maintains a higher level in psychosensory functioning during the patrol and at recovery but exhibits a decrease in Pa level. The HPa group has less adaptable HRV characteristics, less unpredictability and flexibility of cardiac biosignals while the LPa group increases them during the patrol and at recovery time. This dissociation between psychosensory and physiological adaptation suggests two treatment modalities for ICE environments. To our best knowledge, our results are the first to highlight the impact of physiological differences in the HRV profile on the adaptability of submariners. Further studies are needed to evaluate the negative emotional and cognitive effects of ICEs based on the cardiac profile. Artificial intelligence offers a promising future for maintaining high level of operational conditions. These future perspectives will not only allow submariners to be better prepared, but also to design feasible countermeasures that will help support analog environments that bring us closer to a trip to Mars.
Abstract: With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.
Abstract: The present study aimed to evaluate the understanding of the students in Tehran universities (Iran) about the numerical representation of the average rate of change based on the Structure of Observed Learning Outcomes (SOLO). In the present descriptive-survey research, the statistical population included undergraduate students (basic sciences and engineering) in the universities of Tehran. The samples were 604 students selected by random multi-stage clustering. The measurement tool was a task whose face and content validity was confirmed by math and mathematics education professors. Using Cronbach's Alpha criterion, the reliability coefficient of the task was obtained 0.95, which verified its reliability. The collected data were analyzed by descriptive statistics and inferential statistics (chi-squared and independent t-tests) under SPSS-24 software. According to the SOLO model in the prestructural, unistructural, and multistructural levels, basic science students had a higher percentage of understanding than that of engineering students, although the outcome was inverse at the relational level. However, there was no significant difference in the average understanding of both groups. The results indicated that students failed to have a proper understanding of the numerical representation of the average rate of change, in addition to missconceptions when using physics formulas in solving the problem. In addition, multiple solutions were derived along with their dominant methods during the qualitative analysis. The current research proposed to focus on the context problems with approximate calculations and numerical representation, using software and connection common relations between math and physics in the teaching process of teachers and professors.
Abstract: Data mining technique used in the field of clustering is a subject of active research and assists in biological pattern recognition and extraction of new knowledge from raw data. Clustering means the act of partitioning an unlabeled dataset into groups of similar objects. Each group, called a cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Several clustering methods are based on partitional clustering. This category attempts to directly decompose the dataset into a set of disjoint clusters leading to an integer number of clusters that optimizes a given criterion function. The criterion function may emphasize a local or a global structure of the data, and its optimization is an iterative relocation procedure. The K-Means algorithm is one of the most widely used partitional clustering techniques. Since K-Means is extremely sensitive to the initial choice of centers and a poor choice of centers may lead to a local optimum that is quite inferior to the global optimum, we propose a strategy to initiate K-Means centers. The improved K-Means algorithm is compared with the original K-Means, and the results prove how the efficiency has been significantly improved.
Abstract: Deep Venous Thrombosis (DVT) occurs when a
thrombus is formed within a deep vein (most often in the legs). This
disease can be deadly if a part or the whole thrombus reaches the
lung and causes a Pulmonary Embolism (PE). This disorder, often
asymptomatic, has multifactorial causes: immobilization, surgery,
pregnancy, age, cancers, and genetic variations. Our project aims to
relate the thrombus epidemiology (origins, patient predispositions,
PE) to its structure using ultrasound images. Ultrasonography and
elastography were collected using Toshiba Aplio 500 at Brest
Hospital. This manuscript compares two classification approaches:
spectral clustering and scattering operator. The former is based on
the graph and matrix theories while the latter cascades wavelet
convolutions with nonlinear modulus and averaging operators.
Abstract: Grid of computing nodes has emerged as a
representative means of connecting distributed computers or
resources scattered all over the world for the purpose of computing
and distributed storage. Since fault tolerance becomes complex due
to the availability of resources in decentralized grid environment,
it can be used in connection with replication in data grids. The
objective of our work is to present fault tolerance in data grids
with data replication-driven model based on clustering. The
performance of the protocol is evaluated with Omnet++ simulator.
The computational results show the efficiency of our protocol in
terms of recovery time and the number of process in rollbacks.
Abstract: Clustering is an intensive research for some years
because of its multifaceted applications, such as biology, information
retrieval, medicine, business and so on. The expectation maximization
(EM) is a kind of algorithm framework in clustering methods, one
of the ten algorithms of machine learning. Traditionally, optimization
of objective function has been the standard approach in EM. Hence,
research has investigated the utility of evolutionary computing and
related techniques in the regard. Chemical Reaction Optimization
(CRO) is a recently established method. So the property embedded
in CRO is used to solve optimization problems. This paper presents
an algorithm framework (EM-CRO) with modified CRO operators
based on EM cluster problems. The hybrid algorithm is mainly
to solve the problem of initial value sensitivity of the objective
function optimization clustering algorithm. Our experiments mainly
take the EM classic algorithm:k-means and fuzzy k-means as an
example, through the CRO algorithm to optimize its initial value, get
K-means-CRO and FKM-CRO algorithm. The experimental results
of them show that there is improved efficiency for solving objective
function optimization clustering problems.
Abstract: Conceiving and developing routing protocols for
wireless sensor networks requires considerations on constraints such
as network lifetime and energy consumption. In this paper, we propose
a hybrid hierarchical routing protocol named HHRP combining both
clustering mechanism and multipath optimization taking into account
residual energy and RSSI measures. HHRP consists of classifying
dynamically nodes into clusters where coordinators nodes with extra
privileges are able to manipulate messages, aggregate data and ensure
transmission between nodes according to TDMA and CDMA
schedules. The reconfiguration of the network is carried out
dynamically based on a threshold value which is associated with the
number of nodes belonging to the smallest cluster. To show the
effectiveness of the proposed approach HHRP, a comparative study
with LEACH protocol is illustrated in simulations.
Abstract: The Information Retrieval community is facing the problem of effective representation of Web search results. When we organize web search results into clusters it becomes easy to the users to quickly browse through search results. The traditional search engines organize search results into clusters for ambiguous queries, representing each cluster for each meaning of the query. The clusters are obtained according to the topical similarity of the retrieved search results, but it is possible for results to be totally dissimilar and still correspond to the same meaning of the query. People search is also one of the most common tasks on the Web nowadays, but when a particular person’s name is queried the search engines return web pages which are related to different persons who have the same queried name. By placing the burden on the user of disambiguating and collecting pages relevant to a particular person, in this paper, we have developed an approach that clusters web pages based on the association of the web pages to the different people and clusters that are based on generic entity search.
Abstract: Particle reinforced Metal Matrix Composite (MMC) succeeds in synergizing the metallic matrix with ceramic particle reinforcements to result in improved strength, particularly at elevated temperatures, but adversely it affects the ductility of the matrix because of agglomeration and porosity. The present study investigates the outcome of tensile properties in a cast and hot forged composite reinforced simultaneously with coarse and fine particles. Nano-sized alumina particles have been generated by milling mixture of aluminum and manganese dioxide powders. Milled particles after drying are added to molten metal and the resulting slurry is cast. The microstructure of the composites shows good distribution of both the size categories of particles without significant clustering. The presence of nanoparticles along with coarser particles in a composite improves both strength and ductility considerably. Delay in debonding of coarser particles to higher stress is due to reduced mismatch in extension caused by increased strain hardening in presence of the nanoparticles. However, higher addition of powder mix beyond a limit results in deterioration of mechanical properties, possibly due to clustering of nanoparticles. The porosity in cast composite generally increases with the increasing addition of powder mix as observed during process and on forging it has got reduced. The base alloy and nanocomposites show improvement in flow stress which could be attributed to lowering of porosity and grain refinement as a consequence of forging.
Abstract: A DNA microarray technology is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. It is handled by clustering which reveals the natural structures and identifying the interesting patterns in the underlying data. In this paper, gene based clustering in gene expression data is proposed using Cuckoo Search with Differential Evolution (CS-DE). The experiment results are analyzed with gene expression benchmark datasets. The results show that CS-DE outperforms CS in benchmark datasets. To find the validation of the clustering results, this work is tested with one internal and one external cluster validation indexes.
Abstract: Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.
Abstract: Texture is an important characteristic in real and
synthetic scenes. Texture analysis plays a critical role in inspecting
surfaces and provides important techniques in a variety of
applications. Although several descriptors have been presented to
extract texture features, the development of object recognition is still a
difficult task due to the complex aspects of texture. Recently, many
robust and scaling-invariant image features such as SIFT, SURF and
ORB have been successfully used in image retrieval and object
recognition. In this paper, we have tried to compare the performance
for texture classification using these feature descriptors with k-means
clustering. Different classifiers including K-NN, Naive Bayes, Back
Propagation Neural Network , Decision Tree and Kstar were applied in
three texture image sets - UIUCTex, KTH-TIPS and Brodatz,
respectively. Experimental results reveal SIFTS as the best average
accuracy rate holder in UIUCTex, KTH-TIPS and SURF is
advantaged in Brodatz texture set. BP neuro network works best in the
test set classification among all used classifiers.
Abstract: Clustering is a process of grouping objects and data
into groups of clusters to ensure that data objects from the same
cluster are identical to each other. Clustering algorithms in one of the
area in data mining and it can be classified into partition, hierarchical,
density based and grid based. Therefore, in this paper we do survey
and review four major hierarchical clustering algorithms called
CURE, ROCK, CHAMELEON and BIRCH. The obtained state of
the art of these algorithms will help in eliminating the current
problems as well as deriving more robust and scalable algorithms for
clustering.
Abstract: Web mining is to discover and extract useful
Information. Different users may have different search goals when
they search by giving queries and submitting it to a search engine.
The inference and analysis of user search goals can be very useful for
providing an experience result for a user search query. In this project,
we propose a novel approach to infer user search goals by analyzing
search web logs. First, we propose a novel approach to infer user
search goals by analyzing search engine query logs, the feedback
sessions are constructed from user click-through logs and it
efficiently reflect the information needed for users. Second we
propose a preprocessing technique to clean the unnecessary data’s
from web log file (feedback session). Third we propose a technique
to generate pseudo-documents to representation of feedback sessions
for clustering. Finally we implement k-medoids clustering algorithm
to discover different user search goals and to provide a more optimal
result for a search query based on feedback sessions for the user.
Abstract: The capability of exploiting the electronic charge and
spin properties simultaneously in a single material has made diluted
magnetic semiconductors (DMS) remarkable in the field of
spintronics. We report the designing of DMS based on zinc-blend
ZnO doped with Cr impurity. The full potential linearized augmented
plane wave plus local orbital FP-L(APW+lo) method in density
functional theory (DFT) has been adapted to carry out these
investigations. For treatment of exchange and correlation energy,
generalized gradient approximations have been used. Introducing Cr
atoms in the matrix of ZnO has induced strong magnetic moment
with ferromagnetic ordering at stable ground state. Cr:ZnO was found
to favor the short range magnetic interaction that
reflect tendency of Cr clustering. The electronic structure of ZnO is
strongly influenced in the presence of Cr impurity atoms where
impurity bands appear in the band gap.
Abstract: In this work, we begin with the presentation of the
Tθ family of usual similarity measures concerning multidimensional
binary data. Subsequently, some properties of these measures are
proposed. Finally the impact of the use of different inter-elements
measures on the results of the Agglomerative Hierarchical Clustering
Methods is studied.
Abstract: Quantification of cardiac function is performed by
calculating blood volume and ejection fraction in routine clinical
practice. However, these works have been performed by manual
contouring, which requires computational costs and varies on the
observer. In this paper, an automatic left ventricle segmentation
algorithm on cardiac magnetic resonance images (MRI) is presented.
Using knowledge on cardiac MRI, a K-mean clustering technique is
applied to segment blood region on a coil-sensitivity corrected image.
Then, a graph searching technique is used to correct segmentation
errors from coil distortion and noises. Finally, blood volume and
ejection fraction are calculated. Using cardiac MRI from 15 subjects,
the presented algorithm is tested and compared with manual
contouring by experts to show outstanding performance.
Abstract: The North-eastern part of India, which receives
heavier rainfall than other parts of the subcontinent, is of great
concern now-a-days with regard to climate change. High intensity
rainfall for short duration and longer dry spell, occurring due to
impact of climate change, affects river morphology too. In the present
study, an attempt is made to delineate the North-eastern region of
India into some homogeneous clusters based on the Fuzzy Clustering
concept and to compare the resulting clusters obtained by using
conventional methods and nonconventional methods of clustering.
The concept of clustering is adapted in view of the fact that, impact
of climate change can be studied in a homogeneous region without
much variation, which can be helpful in studies related to water
resources planning and management. 10 IMD (Indian Meteorological
Department) stations, situated in various regions of the North-east,
have been selected for making the clusters. The results of the Fuzzy
C-Means (FCM) analysis show different clustering patterns for
different conditions. From the analysis and comparison it can be
concluded that nonconventional method of using GCM data is
somehow giving better results than the others. However, further
analysis can be done by taking daily data instead of monthly means to
reduce the effect of standardization.