Abstract: A packet analyzer is a tool for debugging sensor
network systems and is convenient for developers. In this paper, we
introduce a new packet analyzer based on an embedded system. The
proposed packet analyzer is compatible with IEEE 802.15.4, which is
suitable for the wireless communication standard for sensor networks,
and is available for remote control by adopting a server-client scheme
based on the Ethernet interface. To confirm the operations of the
packet analyzer, we have developed two types of sensor nodes based
on PIC4620 and ATmega128L microprocessors and tested the
functions of the proposed packet analyzer by obtaining the packets
from the sensor nodes.
Abstract: Learning using labeled and unlabelled data has
received considerable amount of attention in the machine learning
community due its potential in reducing the need for expensive
labeled data. In this work we present a new method for combining
labeled and unlabeled data based on classifier ensembles. The model
we propose assumes each classifier in the ensemble observes the
input using different set of features. Classifiers are initially trained
using some labeled samples. The trained classifiers learn further
through labeling the unknown patterns using a teaching signals that is
generated using the decision of the classifier ensemble, i.e. the
classifiers self-supervise each other. Experiments on a set of object
images are presented. Our experiments investigate different classifier
models, different fusing techniques, different training sizes and
different input features. Experimental results reveal that the proposed
self-supervised ensemble learning approach reduces classification
error over the single classifier and the traditional ensemble classifier
approachs.
Abstract: A serious problem on the WWW is finding reliable
information. Not everything found on the Web is true and the
Semantic Web does not change that in any way. The problem will be
even more crucial for the Semantic Web, where agents will be
integrating and using information from multiple sources. Thus, if an
incorrect premise is used due to a single faulty source, then any
conclusions drawn may be in error. Thus, statements published on
the Semantic Web have to be seen as claims rather than as facts, and
there should be a way to decide which among many possibly
inconsistent sources is most reliable. In this work, we propose a trust
model for the Semantic Web. The proposed model is inspired by the
use trust in human society. Trust is a type of social knowledge and
encodes evaluations about which agents can be taken as reliable
sources of information or services. Our proposed model allows
agents to decide which among different sources of information to
trust and thus act rationally on the semantic web.
Abstract: With a surge of stream processing applications novel
techniques are required for generation and analysis of association
rules in streams. The traditional rule mining solutions cannot handle
streams because they generally require multiple passes over the data
and do not guarantee the results in a predictable, small time. Though
researchers have been proposing algorithms for generation of rules
from streams, there has not been much focus on their analysis.
We propose Association rule profiling, a user centric process for
analyzing association rules and attaching suitable profiles to them
depending on their changing frequency behavior over a previous
snapshot of time in a data stream.
Association rule profiles provide insights into the changing nature
of associations and can be used to characterize the associations. We
discuss importance of characteristics such as predictability of
linkages present in the data and propose metric to quantify it. We
also show how association rule profiles can aid in generation of user
specific, more understandable and actionable rules.
The framework is implemented as SUPAR: System for Usercentric
Profiling of Association Rules in streaming data. The
proposed system offers following capabilities:
i) Continuous monitoring of frequency of streaming item-sets
and detection of significant changes therein for association rule
profiling.
ii) Computation of metrics for quantifying predictability of
associations present in the data.
iii) User-centric control of the characterization process: user
can control the framework through a) constraint specification and b)
non-interesting rule elimination.
Abstract: In this paper, we propose an improvement of pattern
growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree
framework and separator database to reduce the execution time and
memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result
shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: This work deals with aspects of support vector learning for large-scale data mining tasks. Based on a decomposition algorithm that can be run in serial and parallel mode we introduce a data transformation that allows for the usage of an expensive generalized kernel without additional costs. In order to speed up the decomposition algorithm we analyze the problem of working set selection for large data sets and analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our modifications and settings lead to improvement of support vector learning performance and thus allow using extensive parameter search methods to optimize classification accuracy.
Abstract: This paper proposes a neural network weights and
topology optimization using genetic evolution and the
backpropagation training algorithm. The proposed crossover and
mutation operators aims to adapt the networks architectures and
weights during the evolution process. Through a specific inheritance
procedure, the weights are transmitted from the parents to their
offsprings, which allows re-exploitation of the already trained
networks and hence the acceleration of the global convergence of the
algorithm. In the preprocessing phase, a new feature extraction
method is proposed based on Legendre moments with the Maximum
entropy principle MEP as a selection criterion. This allows a global
search space reduction in the design of the networks. The proposed
method has been applied and tested on the well known MNIST
database of handwritten digits.
Abstract: Persian (Farsi) script is totally cursive and each character is written in several different forms depending on its former and later characters in the word. These complexities make automatic handwriting recognition of Persian a very hard problem and there are few contributions trying to work it out. This paper presents a novel practical approach to online recognition of Persian handwriting which is based on representation of inputs and patterns with very simple visual features and comparison of these simple terms. This recognition approach is tested over a set of Persian words and the results have been quite acceptable when the possible words where unknown and they were almost all correct in cases that the words where chosen from a prespecified list.
Abstract: The “PYRAMIDS" Block Cipher is a symmetric encryption algorithm of a 64, 128, 256-bit length, that accepts a variable key length of 128, 192, 256 bits. The algorithm is an iterated cipher consisting of repeated applications of a simple round transformation with different operations and different sequence in each round. The algorithm was previously software implemented in Cµ code. In this paper, a hardware implementation of the algorithm, using Field Programmable Gate Arrays (FPGA), is presented. In this work, we discuss the algorithm, the implemented micro-architecture, and the simulation and implementation results. Moreover, we present a detailed comparison with other implemented standard algorithms. In addition, we include the floor plan as well as the circuit diagrams of the various micro-architecture modules.
Abstract: Appeared toward 1986, the object-oriented databases
management systems had not known successes knew five years after
their birth. One of the major difficulties is the query optimization.
We propose in this paper a new approach that permits to enrich
techniques of query optimization existing in the object-oriented
databases. Seen success that knew the query optimization in the
relational model, our approach inspires itself of these optimization
techniques and enriched it so that they can support the new concepts
introduced by the object databases.
Abstract: We suggest a novel method to incorporate longterm
redundancy (LTR) in signal time domain compression
methods. The proposition is based on block-sorting and curve
simplification. The proposition is illustrated on the ECG
signal as a post-processor for the FAN method. Test
applications on the new so-obtained FAN+ method using the
MIT-BIH database show substantial improvement of the
compression ratio-distortion behavior for a higher quality
reconstructed signal.
Abstract: In this paper we use the property of co-occurrence
matrix in finding parallel lines in binary pictures for fingerprint
identification. In our proposed algorithm, we reduce the noise by
filtering the fingerprint images and then transfer the fingerprint
images to binary images using a proper threshold. Next, we divide
the binary images into some regions having parallel lines in the same
direction. The lines in each region have a specific angle that can be
used for comparison. This method is simple, performs the
comparison step quickly and has a good resistance in the presence of
the noise.
Abstract: Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.
Abstract: In terms of total online audience, newspapers are the most successful form of online content to date. The online audience for newspapers continues to demand higher-quality services, including personalized news services. News providers should be able to offer suitable users appropriate content. In this paper, a news article recommender system is suggested based on a user-s preference when he or she visits an Internet news site and reads the published articles. This system helps raise the user-s satisfaction, increase customer loyalty toward the content provider.
Abstract: This paper presents a rule-based text- to- speech
(TTS) Synthesis System for Standard Malay, namely SMaTTS. The
proposed system using sinusoidal method and some pre- recorded
wave files in generating speech for the system. The use of phone
database significantly decreases the amount of computer memory
space used, thus making the system very light and embeddable. The
overall system was comprised of two phases the Natural Language
Processing (NLP) that consisted of the high-level processing of text
analysis, phonetic analysis, text normalization and morphophonemic
module. The module was designed specially for SM to overcome
few problems in defining the rules for SM orthography system before
it can be passed to the DSP module. The second phase is the Digital
Signal Processing (DSP) which operated on the low-level process of
the speech waveform generation. A developed an intelligible and
adequately natural sounding formant-based speech synthesis system
with a light and user-friendly Graphical User Interface (GUI) is
introduced. A Standard Malay Language (SM) phoneme set and an
inclusive set of phone database have been constructed carefully for
this phone-based speech synthesizer. By applying the generative
phonology, a comprehensive letter-to-sound (LTS) rules and a
pronunciation lexicon have been invented for SMaTTS. As for the
evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was
compiled and several experiments have been performed to evaluate
the quality of the synthesized speech by analyzing the Mean Opinion
Score (MOS) obtained. The overall performance of the system as
well as the room for improvements was thoroughly discussed.
Abstract: The present paper is oriented to problems of simulation of anticipatory systems, namely those that use simulation models for the aid of anticipation. A certain analogy between use of simulation and imagining will be applied to make the explication more comprehensible. The paper will be completed by notes of problems and by some existing applications. The problems consist in the fact that simulation of the mentioned anticipatory systems end is simulation of simulating systems, i.e. in computer models handling two or more modeled time axes that should be mapped to real time flow in a nondescent manner. Languages oriented to objects, processes and blocks can be used to surmount the problems.
Abstract: Summarizing skills have been introduced to English
syllabus in secondary school in Malaysia to evaluate student-s comprehension for a given text where it requires students to employ several strategies to produce the summary. This paper reports on our effort to develop a computer-based summarization assessment system
that detects the strategies used by the students in producing their
summaries. Sentence decomposition of expert-written summaries is
used to analyze how experts produce their summary sentences. From
the analysis, we identified seven summarizing strategies and their
rules which are then transformed into a set of heuristic rules on how
to determine the summarizing strategies. We developed an algorithm
based on the heuristic rules and performed some experiments to
evaluate and support the technique proposed.
Abstract: This paper explores an application of an adaptive learning mechanism for robots based on the natural immune system. Most of the research carried out so far are based either on the innate or adaptive characteristics of the immune system, we present a combination of these to achieve behavior arbitration wherein a robot learns to detect vulnerable areas of a track and adapts to the required speed over such portions. The test bed comprises of two Lego robots deployed simultaneously on two predefined near concentric tracks with the outer robot capable of helping the inner one when it misaligns. The helper robot works in a damage-control mode by realigning itself to guide the other robot back onto its track. The panic-stricken robot records the conditions under which it was misaligned and learns to detect and adapt under similar conditions thereby making the overall system immune to such failures.
Abstract: In this paper, we study the application of Extreme
Learning Machine (ELM) algorithm for single layered feedforward
neural networks to non-linear chaotic time series problems. In this
algorithm the input weights and the hidden layer bias are randomly
chosen. The ELM formulation leads to solving a system of linear
equations in terms of the unknown weights connecting the hidden
layer to the output layer. The solution of this general system of
linear equations will be obtained using Moore-Penrose generalized
pseudo inverse. For the study of the application of the method we
consider the time series generated by the Mackey Glass delay
differential equation with different time delays, Santa Fe A and
UCR heart beat rate ECG time series. For the choice of sigmoid,
sin and hardlim activation functions the optimal values for the
memory order and the number of hidden neurons which give the
best prediction performance in terms of root mean square error are
determined. It is observed that the results obtained are in close
agreement with the exact solution of the problems considered
which clearly shows that ELM is a very promising alternative
method for time series prediction.