Abstract: In-place sorting algorithms play an important role in many fields such as very large database systems, data warehouses, data mining, etc. Such algorithms maximize the size of data that can be processed in main memory without input/output operations. In this paper, a novel in-place sorting algorithm is presented. The algorithm comprises two phases; rearranging the input unsorted array in place, resulting segments that are ordered relative to each other but whose elements are yet to be sorted. The first phase requires linear time, while, in the second phase, elements of each segment are sorted inplace in the order of z log (z), where z is the size of the segment, and O(1) auxiliary storage. The algorithm performs, in the worst case, for an array of size n, an O(n log z) element comparisons and O(n log z) element moves. Further, no auxiliary arithmetic operations with indices are required. Besides these theoretical achievements of this algorithm, it is of practical interest, because of its simplicity. Experimental results also show that it outperforms other in-place sorting algorithms. Finally, the analysis of time and space complexity, and required number of moves are presented, along with the auxiliary storage requirements of the proposed algorithm.
Abstract: This paper presents a novel genetic algorithm, termed
the Optimum Individual Monogenetic Algorithm (OIMGA) and
describes its hardware implementation. As the monogenetic strategy
retains only the optimum individual, the memory requirement is
dramatically reduced and no crossover circuitry is needed, thereby
ensuring the requisite silicon area is kept to a minimum.
Consequently, depending on application requirements, OIMGA
allows the investigation of solutions that warrant either larger GA
populations or individuals of greater length. The results given in this
paper demonstrate that both the performance of OIMGA and its
convergence time are superior to those of existing hardware GA
implementations. Local convergence is achieved in OIMGA by
retaining elite individuals, while population diversity is ensured by
continually searching for the best individuals in fresh regions of the
search space.
Abstract: All Text processing systems allow their users to
search a pattern of string from a given text. String matching is
fundamental to database and text processing applications. Every text
editor must contain a mechanism to search the current document for
arbitrary strings. Spelling checkers scan an input text for words in the
dictionary and reject any strings that do not match. We store our
information in data bases so that later on we can retrieve the same
and this retrieval can be done by using various string matching
algorithms. This paper is describing a new string matching algorithm
for various applications. A new algorithm has been designed with the
help of Rabin Karp Matcher, to improve string matching process.
Abstract: As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
Abstract: Data clustering is an important data exploration technique
with many applications in data mining. We present an enhanced
version of the well known single link clustering algorithm. We will
refer to this algorithm as DCBOR. The proposed algorithm alleviates
the chain effect by removing the outliers from the given dataset.
So this algorithm provides outlier detection and data clustering
simultaneously. This algorithm does not need to update the distance
matrix, since the algorithm depends on merging the most k-nearest
objects in one step and the cluster continues grow as long as possible
under specified condition. So the algorithm consists of two phases;
at the first phase, it removes the outliers from the input dataset. At
the second phase, it performs the clustering process. This algorithm
discovers clusters of different shapes, sizes, densities and requires
only one input parameter; this parameter represents a threshold for
outlier points. The value of the input parameter is ranging from 0 to
1. The algorithm supports the user in determining an appropriate
value for it. We have tested this algorithm on different datasets
contain outlier and connecting clusters by chain of density points,
and the algorithm discovers the correct clusters. The results of
our experiments demonstrate the effectiveness and the efficiency of
DCBOR.
Abstract: Graph coloring is an important problem in computer
science and many algorithms are known for obtaining reasonably
good solutions in polynomial time. One method of comparing
different algorithms is to test them on a set of standard graphs where
the optimal solution is already known. This investigation analyzes a
set of 50 well known graph coloring instances according to a set of
complexity measures. These instances come from a variety of
sources some representing actual applications of graph coloring
(register allocation) and others (mycieleski and leighton graphs) that
are theoretically designed to be difficult to solve. The size of the
graphs ranged from ranged from a low of 11 variables to a high of
864 variables. The method used to solve the coloring problem was
the square of the adjacency (i.e., correlation) matrix. The results
show that the most difficult graphs to solve were the leighton and the
queen graphs. Complexity measures such as density, mobility,
deviation from uniform color class size and number of block
diagonal zeros are calculated for each graph. The results showed that
the most difficult problems have low mobility (in the range of .2-.5)
and relatively little deviation from uniform color class size.
Abstract: Fine-grained data replication over the Internet allows duplication of frequently accessed data objects, as opposed to entire sites, to certain locations so as to improve the performance of largescale content distribution systems. In a distributed system, agents representing their sites try to maximize their own benefit since they are driven by different goals such as to minimize their communication costs, latency, etc. In this paper, we will use game theoretical techniques and in particular auctions to identify a bidding mechanism that encapsulates the selfishness of the agents, while having a controlling hand over them. In essence, the proposed game theory based mechanism is the study of what happens when independent agents act selfishly and how to control them to maximize the overall performance. A bidding mechanism asks how one can design systems so that agents- selfish behavior results in the desired system-wide goals. Experimental results reveal that this mechanism provides excellent solution quality, while maintaining fast execution time. The comparisons are recorded against some well known techniques such as greedy, branch and bound, game theoretical auctions and genetic algorithms.
Abstract: Traffic management in an urban area is highly facilitated by the knowledge of the traffic conditions in every street or highway involved in the vehicular mobility system. Aim of the paper is to propose a neuro-fuzzy approach able to compute the main parameters of a traffic system, i.e., car density, velocity and flow, by using the images collected by the web-cams located at the crossroads of the traffic network. The performances of this approach encourage its application when the traffic system is far from the saturation. A fuzzy model is also outlined to evaluate when it is suitable to use more accurate, even if more time consuming, algorithms for measuring traffic conditions near to saturation.
Abstract: Despite the fact that Arabic language is currently one
of the most common languages worldwide, there has been only a
little research on Arabic speech recognition relative to other
languages such as English and Japanese. Generally, digital speech
processing and voice recognition algorithms are of special
importance for designing efficient, accurate, as well as fast automatic
speech recognition systems. However, the speech recognition process
carried out in this paper is divided into three stages as follows: firstly,
the signal is preprocessed to reduce noise effects. After that, the
signal is digitized and hearingized. Consequently, the voice activity
regions are segmented using voice activity detection (VAD)
algorithm. Secondly, features are extracted from the speech signal
using Mel-frequency cepstral coefficients (MFCC) algorithm.
Moreover, delta and acceleration (delta-delta) coefficients have been
added for the reason of improving the recognition accuracy. Finally,
each test word-s features are compared to the training database using
dynamic time warping (DTW) algorithm. Utilizing the best set up
made for all affected parameters to the aforementioned techniques,
the proposed system achieved a recognition rate of about 98.5%
which outperformed other HMM and ANN-based approaches
available in the literature.
Abstract: A computational platform is presented in this
contribution. It has been designed as a virtual laboratory to be used
for exploring optimization algorithms in biological problems. This
platform is built on a blackboard-based agent architecture. As a test
case, the version of the platform presented here is devoted to the
study of protein folding, initially with a bead-like description of the
chain and with the widely used model of hydrophobic and polar
residues (HP model). Some details of the platform design are
presented along with its capabilities and also are revised some
explorations of the protein folding problems with different types of
discrete space. It is also shown the capability of the platform to
incorporate specific tools for the structural analysis of the runs in
order to understand and improve the optimization process.
Accordingly, the results obtained demonstrate that the ensemble of
computational tools into a single platform is worthwhile by itself,
since experiments developed on it can be designed to fulfill different
levels of information in a self-consistent fashion. By now, it is being
explored how an experiment design can be useful to create a
computational agent to be included within the platform. These
inclusions of designed agents –or software pieces– are useful for the
better accomplishment of the tasks to be developed by the platform.
Clearly, while the number of agents increases the new version of the
virtual laboratory thus enhances in robustness and functionality.
Abstract: The System Identification problem looks for a
suitably parameterized model, representing a given process. The
parameters of the model are adjusted to optimize a performance
function based on error between the given process output and
identified process output. The linear system identification field is
well established with many classical approaches whereas most of
those methods cannot be applied for nonlinear systems. The problem
becomes tougher if the system is completely unknown with only the
output time series is available. It has been reported that the
capability of Artificial Neural Network to approximate all linear and
nonlinear input-output maps makes it predominantly suitable for the
identification of nonlinear systems, where only the output time series
is available. [1][2][4][5]. The work reported here is an attempt to
implement few of the well known algorithms in the context of
modeling of nonlinear systems, and to make a performance
comparison to establish the relative merits and demerits.
Abstract: In this contribution, the use of a new genetic operator is proposed. The main advantage of using this operator is that it is able to assist the evolution procedure to converge faster towards the optimal solution of a problem. This new genetic operator is called ''intuition'' operator. Generally speaking, one can claim that this operator is a way to include any heuristic or any other local knowledge, concerning the problem, that cannot be embedded in the fitness function. Simulation results show that the use of this operator increases significantly the performance of the classic Genetic Algorithm by increasing the convergence speed of its population.
Abstract: Pattern matching based on regular tree grammars have been widely used in many areas of computer science. In this paper, we propose a pattern matcher within the framework of code generation, based on a generic and a formalized approach. According to this approach, parsers for regular tree grammars are adapted to a general pattern matching solution, rather than adapting the pattern matching according to their parsing behavior. Hence, we first formalize the construction of the pattern matches respective to input trees drawn from a regular tree grammar in a form of the so-called match trees. Then, we adopt a recently developed generic parser and tightly couple its parsing behavior with such construction. In addition to its generality, the resulting pattern matcher is characterized by its soundness and efficient implementation. This is demonstrated by the proposed theory and by the derived algorithms for its implementation. A comparison with similar and well-known approaches, such as the ones based on tree automata and LR parsers, has shown that our pattern matcher can be applied to a broader class of grammars, and achieves better approximation of pattern matches in one pass. Furthermore, its use as a machine code selector is characterized by a minimized overhead, due to the balanced distribution of the cost computations into static ones, during parser generation time, and into dynamic ones, during parsing time.
Abstract: CloudSim is a useful tool to simulate the cloud
environment. It shows the service availability, the power consumption,
and the network traffic of services on the cloud environment.
Moreover, it supports to calculate a network communication delay
through a network topology data easily. CloudSim allows inputting a
file of topology data, but it does not provide any generating process.
Thus, it needs the file of topology data generated from some other
tools. The BRITE is typical network topology generator. Also, it
supports various type of topology generating algorithms. If CloudSim
can include the BRITE, network simulation for clouds is easier than
existing version. This paper shows the potential of connection between
BRITE and CloudSim. Also, it proposes the direction to link between
them.
Abstract: By the application of an improved back-propagation
neural network (BPNN), a model of current densities for a solid oxide
fuel cell (SOFC) with 10 layers is established in this study. To build
the learning data of BPNN, Taguchi orthogonal array is applied to
arrange the conditions of operating parameters, which totally 7 factors
act as the inputs of BPNN. Also, the average current densities
achieved by numerical method acts as the outputs of BPNN.
Comparing with the direct solution, the learning errors for all learning
data are smaller than 0.117%, and the predicting errors for 27
forecasting cases are less than 0.231%. The results show that the
presented model effectively builds a mathematical algorithm to predict
performance of a SOFC stack immediately in real time.
Also, the calculating algorithms are applied to proceed with the
optimization of the average current density for a SOFC stack. The
operating performance window of a SOFC stack is found to be
between 41137.11 and 53907.89. Furthermore, an inverse predicting
model of operating parameters of a SOFC stack is developed here by
the calculating algorithms of the improved BPNN, which is proved to
effectively predict operating parameters to achieve a desired
performance output of a SOFC stack.
Abstract: Task of object localization is one of the major
challenges in creating intelligent transportation. Unfortunately, in
densely built-up urban areas, localization based on GPS only
produces a large error, or simply becomes impossible. New
opportunities arise for the localization due to the rapidly emerging
concept of a wireless ad-hoc network. Such network, allows
estimating potential distance between these objects measuring
received signal level and construct a graph of distances in which
nodes are the localization objects, and edges - estimates of the
distances between pairs of nodes. Due to the known coordinates of
individual nodes (anchors), it is possible to determine the location of
all (or part) of the remaining nodes of the graph. Moreover, road
map, available in digital format can provide localization routines
with valuable additional information to narrow node location search.
However, despite abundance of well-known algorithms for solving
the problem of localization and significant research efforts, there are
still many issues that currently are addressed only partially. In this
paper, we propose localization approach based on the graph mapped
distances on the digital road map data basis. In fact, problem is
reduced to distance graph embedding into the graph representing area
geo location data. It makes possible to localize objects, in some cases
even if only one reference point is available. We propose simple
embedding algorithm and sample implementation as spatial queries
over sensor network data stored in spatial database, allowing
employing effectively spatial indexing, optimized spatial search
routines and geometry functions.
Abstract: In most rule-induction algorithms, the only operator used against nominal attributes is the equality operator =. In this paper, we first propose the use of the inequality operator, ≠, in addition to the equality operator, to increase the expressiveness of induced rules. Then, we present a new method, Binary Coding, which can be used along with an arbitrary rule-induction algorithm to make use of the inequality operator without any need to change the algorithm. Experimental results suggest that the Binary Coding method is promising enough for further investigation, especially in cases where the minimum number of rules is desirable.
Abstract: Topology Optimization is a defined as the method of
determining optimal distribution of material for the assumed design
space with functionality, loads and boundary conditions [1].
Topology optimization can be used to optimize shape for the
purposes of weight reduction, minimizing material requirements or
selecting cost effective materials [2]. Topology optimization has been
implemented through the use of finite element methods for the
analysis, and optimization techniques based on the method of moving
asymptotes, genetic algorithms, optimality criteria method, level sets
and topological derivatives. Case study of Typical “Fuselage design"
is considered for this paper to explain the benefits of Topology
Optimization in the design cycle. A cylindrical shell is assumed as
the design space and aerospace standard pay loads were applied on
the fuselage with wing attachments as constraints. Then topological
optimization is done using Finite Element (FE) based software. This
optimization results in the structural concept design which satisfies
all the design constraints using minimum material.
Abstract: Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.
Abstract: Sequences of execution of algorithms in an interactive
manner using multimedia tools are employed in this paper. It helps to
realize the concept of fundamentals of algorithms such as searching
and sorting method in a simple manner. Visualization gains more
attention than theoretical study and it is an easy way of learning
process. We propose methods for finding runtime sequence of each
algorithm in an interactive way and aims to overcome the drawbacks
of the existing character systems. System illustrates each and every
step clearly using text and animation. Comparisons of its time
complexity have been carried out and results show that our approach
provides better perceptive of algorithms.