Abstract: In this study, a fuzzy integrated logical forecasting method (FILF) is extended for multi-variate systems by using a vector autoregressive model. Fuzzy time series forecasting (FTSF) method was recently introduced by Song and Chissom [1]-[2] after that Chen improved the FTSF method. Rather than the existing literature, the proposed model is not only compared with the previous FTS models, but also with the conventional time series methods such as the classical vector autoregressive model. The cluster optimization is based on the C-means clustering method. An empirical study is performed for the prediction of the chartering rates of a group of dry bulk cargo ships. The root mean squared error (RMSE) metric is used for the comparing of results of methods and the proposed method has superiority than both traditional FTS methods and also the classical time series methods.
Abstract: Scale Invariant Feature Transform (SIFT) has been
widely applied, but extracting SIFT feature is complicated and
time-consuming. In this paper, to meet the demand of the real-time
applications, SIFT is parallelized and optimized on cluster system,
which is named pSIFT. Redundancy storage and communication are
used for boundary data to improve the performance, and before
representation of feature descriptor, data reallocation is adopted to
keep load balance in pSIFT. Experimental results show that pSIFT
achieves good speedup and scalability.
Abstract: The quality of short term load forecasting can improve the efficiency of planning and operation of electric utilities. Artificial Neural Networks (ANNs) are employed for nonlinear short term load forecasting owing to their powerful nonlinear mapping capabilities. At present, there is no systematic methodology for optimal design and training of an artificial neural network. One has often to resort to the trial and error approach. This paper describes the process of developing three layer feed-forward large neural networks for short-term load forecasting and then presents a heuristic search algorithm for performing an important task of this process, i.e. optimal networks structure design. Particle Swarm Optimization (PSO) is used to develop the optimum large neural network structure and connecting weights for one-day ahead electric load forecasting problem. PSO is a novel random optimization method based on swarm intelligence, which has more powerful ability of global optimization. Employing PSO algorithms on the design and training of ANNs allows the ANN architecture and parameters to be easily optimized. The proposed method is applied to STLF of the local utility. Data are clustered due to the differences in their characteristics. Special days are extracted from the normal training sets and handled separately. In this way, a solution is provided for all load types, including working days and weekends and special days. The experimental results show that the proposed method optimized by PSO can quicken the learning speed of the network and improve the forecasting precision compared with the conventional Back Propagation (BP) method. Moreover, it is not only simple to calculate, but also practical and effective. Also, it provides a greater degree of accuracy in many cases and gives lower percent errors all the time for STLF problem compared to BP method. Thus, it can be applied to automatically design an optimal load forecaster based on historical data.
Abstract: In this paper, the periodic surveillance scheme has
been proposed for any convex region using mobile wireless sensor
nodes. A sensor network typically consists of fixed number of
sensor nodes which report the measurements of sensed data such as
temperature, pressure, humidity, etc., of its immediate proximity
(the area within its sensing range). For the purpose of sensing an
area of interest, there are adequate number of fixed sensor
nodes required to cover the entire region of interest. It implies
that the number of fixed sensor nodes required to cover a given
area will depend on the sensing range of the sensor as well as
deployment strategies employed. It is assumed that the sensors to
be mobile within the region of surveillance, can be mounted on
moving bodies like robots or vehicle. Therefore, in our
scheme, the surveillance time period determines the number of
sensor nodes required to be deployed in the region of interest.
The proposed scheme comprises of three algorithms namely:
Hexagonalization, Clustering, and Scheduling, The first algorithm
partitions the coverage area into fixed sized hexagons that
approximate the sensing range (cell) of individual sensor node.
The clustering algorithm groups the cells into clusters, each of
which will be covered by a single sensor node. The later
determines a schedule for each sensor to serve its respective cluster.
Each sensor node traverses all the cells belonging to the cluster
assigned to it by oscillating between the first and the last cell for
the duration of its life time. Simulation results show that our
scheme provides full coverage within a given period of time using
few sensors with minimum movement, less power consumption,
and relatively less infrastructure cost.
Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.
Abstract: Virtualization and high performance computing have been discussed from a performance perspective in recent publications. We present and discuss a flexible and efficient approach to the management of virtual clusters. A virtual machine management tool is extended to function as a fabric for cluster deployment and management. We show how features such as saving the state of a running cluster can be used to avoid disruption. We also compare our approach to the traditional methods of cluster deployment and present benchmarks which illustrate the efficiency of our approach.
Abstract: As data to be stored in storage subsystems
tremendously increases, data protection techniques have become more
important than ever, to provide data availability and reliability. In this
paper, we present the file system-based data protection (WOWSnap)
that has been implemented using WORM (Write-Once-Read-Many)
scheme. In the WOWSnap, once WORM files have been created, only
the privileged read requests to them are allowed to protect data against
any intentional/accidental intrusions. Furthermore, all WORM files
are related to their protection cycle that is a time period during which
WORM files should securely be protected. Once their protection cycle
is expired, the WORM files are automatically moved to the
general-purpose data section without any user interference. This
prevents the WORM data section from being consumed by
unnecessary files. We evaluated the performance of WOWSnap on
Linux cluster.
Abstract: Wireless Sensor Network is Multi hop Self-configuring
Wireless Network consisting of sensor nodes. The deployment of
wireless sensor networks in many application areas, e.g., aggregation
services, requires self-organization of the network nodes into clusters.
Efficient way to enhance the lifetime of the system is to partition the
network into distinct clusters with a high energy node as cluster head.
The different methods of node clustering techniques have appeared in
the literature, and roughly fall into two families; those based on the
construction of a dominating set and those which are based solely on
energy considerations. Energy optimized cluster formation for a set
of randomly scattered wireless sensors is presented. Sensors within a
cluster are expected to be communicating with cluster head only. The
energy constraint and limited computing resources of the sensor nodes
present the major challenges in gathering the data. In this paper we
propose a framework to study how partially correlated data affect the
performance of clustering algorithms. The total energy consumption
and network lifetime can be analyzed by combining random geometry
techniques and rate distortion theory. We also present the relation
between compression distortion and data correlation.
Abstract: Text similarity measurement is a fundamental issue in
many textual applications such as document clustering, classification,
summarization and question answering. However, prevailing approaches
based on Vector Space Model (VSM) more or less suffer
from the limitation of Bag of Words (BOW), which ignores the semantic
relationship among words. Enriching document representation
with background knowledge from Wikipedia is proven to be an effective
way to solve this problem, but most existing methods still
cannot avoid similar flaws of BOW in a new vector space. In this
paper, we propose a novel text similarity measurement which goes
beyond VSM and can find semantic affinity between documents.
Specifically, it is a unified graph model that exploits Wikipedia as
background knowledge and synthesizes both document representation
and similarity computation. The experimental results on two different
datasets show that our approach significantly improves VSM-based
methods in both text clustering and classification.
Abstract: Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.
Abstract: Clustering categorical data is more complicated than
the numerical clustering because of its special properties. Scalability
and memory constraint is the challenging problem in clustering large
data set. This paper presents an incremental algorithm to cluster the
categorical data. Frequencies of attribute values contribute much in
clustering similar categorical objects. In this paper we propose new
similarity measures based on the frequencies of attribute values and
its cardinalities. The proposed measures and the algorithm are
experimented with the data sets from UCI data repository. Results
prove that the proposed method generates better clusters than the
existing one.
Abstract: In this paper we present the PC cluster built at R.V.
College of Engineering (with great help from the Department of
Computer Science and Electrical Engineering). The structure of the
cluster is described and the performance is evaluated by rendering of
complex 3D Persistence of Vision (POV) images by the Ray-Tracing
algorithm. Here, we propose an unexampled method to render such
images, distributedly on a low cost scalable.
Abstract: An important structuring mechanism for knowledge bases is building clusters based on the content of their knowledge objects. The objects are clustered based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity. Clustering can also facilitate taxonomy formation, that is, the organization of observations into a hierarchy of classes that group similar events together. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. In this paper, a set of related HPRs is called a cluster and is represented by a HPR-tree. This paper discusses an algorithm based on cumulative learning scenario for dynamic structuring of clusters. The proposed scheme incrementally incorporates new knowledge into the set of clusters from the previous episodes and also maintains summary of clusters as Synopsis to be used in the future episodes. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested incremental structuring of clusters would be useful in mining data streams.
Abstract: The success of an electronic system in a System-on- Chip is highly dependent on the efficiency of its interconnection network, which is constructed from routers and channels (the routers move data across the channels between nodes). Since neither classical bus based nor point to point architectures can provide scalable solutions and satisfy the tight power and performance requirements of future applications, the Network-on-Chip (NoC) approach has recently been proposed as a promising solution. Indeed, in contrast to the traditional solutions, the NoC approach can provide large bandwidth with moderate area overhead. The selected topology of the components interconnects plays prime rule in the performance of NoC architecture as well as routing and switching techniques that can be used. In this paper, we present two generic NoC architectures that can be customized to the specific communication needs of an application in order to reduce the area with minimal degradation of the latency of the system. An experimental study is performed to compare these structures with basic NoC topologies represented by 2D mesh, Butterfly-Fat Tree (BFT) and SPIN. It is shown that Cluster mesh (CMesh) and MinRoot schemes achieves significant improvements in network latency and energy consumption with only negligible area overhead and complexity over existing architectures. In fact, in the case of basic NoC topologies, CMesh and MinRoot schemes provides substantial savings in area as well, because they requires fewer routers. The simulation results show that CMesh and MinRoot networks outperforms MESH, BFT and SPIN in main performance metrics.
Abstract: In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.
Abstract: An automated wood recognition system is designed to
classify tropical wood species.The wood features are extracted based
on two feature extractors: Basic Grey Level Aura Matrix (BGLAM)
technique and statistical properties of pores distribution (SPPD)
technique. Due to the nonlinearity of the tropical wood species
separation boundaries, a pre classification stage is proposed which
consists ofKmeans clusteringand kernel discriminant analysis (KDA).
Finally, Linear Discriminant Analysis (LDA) classifier and KNearest
Neighbour (KNN) are implemented for comparison purposes.
The study involves comparison of the system with and without pre
classification using KNN classifier and LDA classifier.The results
show that the inclusion of the pre classification stage has improved
the accuracy of both the LDA and KNN classifiers by more than
12%.
Abstract: I/O workload is a critical and important factor to
analyze I/O pattern and file system performance. However tracing I/O
operations on the fly distributed parallel file system is non-trivial due
to collection overhead and a large volume of data. In this paper, we
design and implement a parallel file system logging method for high
performance computing using shared memory-based multi-layer
scheme. It minimizes the overhead with reduced logging operation
response time and provides efficient post-processing scheme through
shared memory. Separated logging server can collect sequential logs
from multiple clients in a cluster through packet communication.
Implementation and evaluation result shows low overhead and high
scalability of this architecture for high performance parallel logging
analysis.
Abstract: This study proposes a new recommender system based on the collaborative folksonomy. The purpose of the proposed system is to recommend Internet resources (such as books, articles, documents, pictures, audio and video) to users. The proposed method includes four steps: creating the user profile based on the tags, grouping the similar users into clusters using an agglomerative hierarchical clustering, finding similar resources based on the user-s past collections by using content-based filtering, and recommending similar items to the target user. This study examines the system-s performance for the dataset collected from “del.icio.us," which is a famous social bookmarking website. Experimental results show that the proposed tag-based collaborative and content-based filtering hybridized recommender system is promising and effectiveness in the folksonomy-based bookmarking website.
Abstract: Software maintenance is extremely important activity in software development life cycle. It involves a lot of human efforts, cost and time. Software maintenance may be further subdivided into different activities such as fault prediction, fault detection, fault prevention, fault correction etc. This topic has gained substantial attention due to sophisticated and complex applications, commercial hardware, clustered architecture and artificial intelligence. In this paper we surveyed the work done in the field of software maintenance. Software fault prediction has been studied in context of fault prone modules, self healing systems, developer information, maintenance models etc. Still a lot of things like modeling and weightage of impact of different kind of faults in the various types of software systems need to be explored in the field of fault severity.
Abstract: A new technique of topological multi-scale analysis is
introduced. By performing a clustering recursively to build a
hierarchy, and analyzing the co-scale and intra-scale similarities, an
Iterated Function System can be extracted from any data set. The study
of fractals shows that this method is efficient to extract
self-similarities, and can find elegant solutions the inverse problem of
building fractals. The theoretical aspects and practical
implementations are discussed, together with examples of analyses of
simple fractals.