Abstract: In order to analyze large-scale scientific data, research
on data exploration and visualization has gained popularity. In this
paper, we focus on the exploration and visualization of scientific
simulation data, and define a spatial V-Optimal histogram for
data summarization. We propose histogram construction algorithms
based on a general binary hierarchical partitioning as well as
a more specific one, the l-grid partitioning. For effective data
summarization and efficient data visualization in scientific data
analysis, we propose an optimal algorithm as well as a heuristic
algorithm for histogram construction. To verify the effectiveness and
efficiency of the proposed methods, we conduct experiments on the
massive evacuation simulation data.
Abstract: In this paper we discuss the development of an Augmented Reality (AR) - based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations. Therefore we have developed a data model to effectively manage such data and enhance the computational support needed for the effective data explorations. A challenge of this approach is to reduce the data inefficiency powered by the disparate, repeated, inconsistent and missing attributes of most available sensors datasets. To handle this challenge, this paper aim to develop an AR-based scientific visualization interface which automatically identifies, localise and visualizes all necessary data relevant to a particularly selected region of interest (ROI) along the virtual pipeline network. Necessary system architectural supports needed as well as the interface requirements for such visualizations are also discussed in this paper.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: In Data mining, Fuzzy clustering algorithms have
demonstrated advantage over crisp clustering algorithms in dealing
with the challenges posed by large collections of vague and uncertain
natural data. This paper reviews concept of fuzzy logic and fuzzy
clustering. The classical fuzzy c-means algorithm is presented and its
limitations are highlighted. Based on the study of the fuzzy c-means
algorithm and its extensions, we propose a modification to the cmeans
algorithm to overcome the limitations of it in calculating the
new cluster centers and in finding the membership values with
natural data. The efficiency of the new modified method is
demonstrated on real data collected for Bhutan-s Gross National
Happiness (GNH) program.
Abstract: Data clustering is an important data exploration technique
with many applications in data mining. We present an enhanced
version of the well known single link clustering algorithm. We will
refer to this algorithm as DCBOR. The proposed algorithm alleviates
the chain effect by removing the outliers from the given dataset.
So this algorithm provides outlier detection and data clustering
simultaneously. This algorithm does not need to update the distance
matrix, since the algorithm depends on merging the most k-nearest
objects in one step and the cluster continues grow as long as possible
under specified condition. So the algorithm consists of two phases;
at the first phase, it removes the outliers from the input dataset. At
the second phase, it performs the clustering process. This algorithm
discovers clusters of different shapes, sizes, densities and requires
only one input parameter; this parameter represents a threshold for
outlier points. The value of the input parameter is ranging from 0 to
1. The algorithm supports the user in determining an appropriate
value for it. We have tested this algorithm on different datasets
contain outlier and connecting clusters by chain of density points,
and the algorithm discovers the correct clusters. The results of
our experiments demonstrate the effectiveness and the efficiency of
DCBOR.
Abstract: Longitudinal data typically have the characteristics of
changes over time, nonlinear growth patterns, between-subjects
variability, and the within errors exhibiting heteroscedasticity and
dependence. The data exploration is more complicated than that of
cross-sectional data. The purpose of this paper is to organize/integrate
of various visual-graphical techniques to explore longitudinal data.
From the application of the proposed methods, investigators can
answer the research questions include characterizing or describing the
growth patterns at both group and individual level, identifying the time
points where important changes occur and unusual subjects, selecting
suitable statistical models, and suggesting possible within-error
variance.