Abstract: Text categorization is the problem of classifying text
documents into a set of predefined classes. After a preprocessing
step, the documents are typically represented as large sparse vectors.
When training classifiers on large collections of documents, both the
time and memory restrictions can be quite prohibitive. This justifies
the application of feature selection methods to reduce the
dimensionality of the document-representation vector. In this paper,
we present three feature selection methods: Information Gain,
Support Vector Machine feature selection called (SVM_FS) and
Genetic Algorithm with SVM (called GA_SVM). We show that the
best results were obtained with GA_SVM method for a relatively
small dimension of the feature vector.
Abstract: Along with the progress of our information society,
various risks are becoming increasingly common, causing multiple social problems. For this reason, risk communications for
establishing consensus among stakeholders who have different
priorities have become important. However, it is not always easy for the decision makers to agree on measures to reduce risks based on
opposing concepts, such as security, privacy and cost. Therefore, we previously developed and proposed the “Multiple Risk Communicator" (MRC) with the following functions: (1) modeling
the support role of the risk specialist, (2) an optimization engine, and (3) displaying the computed results. In this paper, MRC program
version 1.0 is applied to the personal information leakage problem. The application process and validation of the results are discussed.
Abstract: A SCADA (Supervisory Control And Data
Acquisition) system is an industrial control and monitoring system for
national infrastructures. The SCADA systems were used in a closed
environment without considering about security functionality in the
past. As communication technology develops, they try to connect the
SCADA systems to an open network. Therefore, the security of the
SCADA systems has been an issue. The study of key management for
SCADA system also has been performed. However, existing key
management schemes for SCADA system such as SKE(Key
establishment for SCADA systems) and SKMA(Key management
scheme for SCADA systems) cannot support broadcasting
communication. To solve this problem, an Advanced Key
Management Architecture for Secure SCADA Communication has
been proposed by Choi et al.. Choi et al.-s scheme also has a problem
that it requires lots of computational cost for multicasting
communication. In this paper, we propose an enhanced scheme which
improving computational cost for multicasting communication with
considering the number of keys to be stored in a low power
communication device (RTU).
Abstract: The growth of open networks created the interest to
commercialise it. The establishment of an electronic business
mechanism must be accompanied by a digital – electronic payment
system to transfer the value of transactions. Financial organizations
are requested to offer a secure e-payment synthesis with equivalent
level of security served in conventional paper-based payment
transactions. PKI, which is functioning as a chain of trust in security
architecture, can enable security services of cryptography to epayments,
in order to take advantage of the wider base either of
customer or of trading partners and the reduction of cost transaction
achieved by the use of Internet channels. The paper addresses the
possibilities and the implementation suggestions of PKI in relevance
to electronic payments by suggesting a framework that should be
followed.
Abstract: In geometrical camera calibration, the objective is to
determine a set of camera parameters that describe the mapping
between 3D references coordinates and 2D image coordinates. In this
paper, a technique of calibration and tracking based on both a least
squares method is presented and a correlation technique developed as
part of an augmented reality system. This approach is fast and it can
be used for a real time system
Abstract: In the Equivalent Transformation (ET) computation
model, a program is constructed by the successive accumulation of
ET rules. A method by meta-computation by which a correct ET
rule is generated has been proposed. Although the method covers a
broad range in the generation of ET rules, all important ET rules
are not necessarily generated. Generation of more ET rules can be
achieved by supplementing generation methods which are specialized
for important ET rules. A Specialization-by-Equation (Speq) rule is
one of those important rules. A Speq rule describes a procedure in
which two variables included in an atom conjunction are equalized
due to predicate constraints. In this paper, we propose an algorithm
that systematically and recursively generate Speq rules and discuss
its effectiveness in the synthesis of ET programs. A Speq rule is
generated based on proof of a logical formula consisting of given
atom set and dis-equality. The proof is carried out by utilizing some
ET rules and the ultimately obtained rules in generating Speq rules.
Abstract: A new approach is adopted in this paper based
on Turk and Pentland-s eigenface method. It was found that the
probability density function of the distance between the projection
vector of the input face image and the average projection vector of
the subject in the face database, follows Rayleigh distribution. In
order to decrease the false acceptance rate and increase the
recognition rate, the input face image has been recognized using two
thresholds including the acceptance threshold and the rejection
threshold. We also find out that the value of two thresholds will be
close to each other as number of trials increases. During the training,
in order to reduce the number of trials, the projection vectors for each
subject has been averaged. The recognition experiments using the
proposed algorithm show that the recognition rate achieves to
92.875% whilst the average number of judgment is only 2.56 times.
Abstract: This paper presents a new strategy of identification
and classification of pathological voices using the hybrid method
based on wavelet transform and neural networks. After speech
acquisition from a patient, the speech signal is analysed in order to
extract the acoustic parameters such as the pitch, the formants, Jitter,
and shimmer. Obtained results will be compared to those normal and
standard values thanks to a programmable database. Sounds are
collected from normal people and patients, and then classified into
two different categories. Speech data base is consists of several
pathological and normal voices collected from the national hospital
“Rabta-Tunis". Speech processing algorithm is conducted in a
supervised mode for discrimination of normal and pathology voices
and then for classification between neural and vocal pathologies
(Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation
results will be presented in function of the disease and will be
compared with the clinical diagnosis in order to have an objective
evaluation of the developed tool.
Abstract: I/O workload is a critical and important factor to
analyze I/O pattern and file system performance. However tracing I/O
operations on the fly distributed parallel file system is non-trivial due
to collection overhead and a large volume of data. In this paper, we
design and implement a parallel file system logging method for high
performance computing using shared memory-based multi-layer
scheme. It minimizes the overhead with reduced logging operation
response time and provides efficient post-processing scheme through
shared memory. Separated logging server can collect sequential logs
from multiple clients in a cluster through packet communication.
Implementation and evaluation result shows low overhead and high
scalability of this architecture for high performance parallel logging
analysis.
Abstract: Rule Discovery is an important technique for mining
knowledge from large databases. Use of objective measures for
discovering interesting rules leads to another data mining problem,
although of reduced complexity. Data mining researchers have
studied subjective measures of interestingness to reduce the volume
of discovered rules to ultimately improve the overall efficiency of
KDD process.
In this paper we study novelty of the discovered rules as a
subjective measure of interestingness. We propose a hybrid approach
based on both objective and subjective measures to quantify novelty
of the discovered rules in terms of their deviations from the known
rules (knowledge). We analyze the types of deviation that can arise
between two rules and categorize the discovered rules according to
the user specified threshold. We implement the proposed framework
and experiment with some public datasets. The experimental results
are promising.
Abstract: Knowledge-based e-mail systems focus on
incorporating knowledge management approach in order to enhance
the traditional e-mail systems. In this paper, we present a knowledgebased
e-mail system called KS-Mail where people do not only send
and receive e-mail conventionally but are also able to create a sense
of knowledge flow. We introduce semantic processing on the e-mail
contents by automatically assigning categories and providing links to
semantically related e-mails. This is done to enrich the knowledge
value of each e-mail as well as to ease the organization of the e-mails
and their contents. At the application level, we have also built
components like the service manager, evaluation engine and search
engine to handle the e-mail processes efficiently by providing the
means to share and reuse knowledge. For this purpose, we present the
KS-Mail architecture, and elaborate on the details of the e-mail
server and the application server. We present the ontology mapping
technique used to achieve the e-mail content-s categorization as well
as the protocols that we have developed to handle the transactions in
the e-mail system. Finally, we discuss further on the implementation
of the modules presented in the KS-Mail architecture.
Abstract: In this paper a deterministic polynomial-time
algorithm is presented for the Clique problem. The case is considered
as the problem of omitting the minimum number of vertices from the
input graph so that none of the zeroes on the graph-s adjacency
matrix (except the main diagonal entries) would remain on the
adjacency matrix of the resulting subgraph. The existence of a
deterministic polynomial-time algorithm for the Clique problem, as
an NP-complete problem will prove the equality of P and NP
complexity classes.
Abstract: Automatic currency note recognition invariably
depends on the currency note characteristics of a particular country
and the extraction of features directly affects the recognition ability.
Sri Lanka has not been involved in any kind of research or
implementation of this kind. The proposed system “SLCRec" comes
up with a solution focusing on minimizing false rejection of notes.
Sri Lankan currency notes undergo severe changes in image quality
in usage. Hence a special linear transformation function is adapted to
wipe out noise patterns from backgrounds without affecting the
notes- characteristic images and re-appear images of interest. The
transformation maps the original gray scale range into a smaller
range of 0 to 125. Applying Edge detection after the transformation
provided better robustness for noise and fair representation of edges
for new and old damaged notes. A three layer back propagation
neural network is presented with the number of edges detected in row
order of the notes and classification is accepted in four classes of
interest which are 100, 500, 1000 and 2000 rupee notes. The
experiments showed good classification results and proved that the
proposed methodology has the capability of separating classes
properly in varying image conditions.
Abstract: In this paper, we present an innovative scheme of
blindly extracting message bits from an image distorted by an attack.
Support Vector Machine (SVM) is used to nonlinearly classify the
bits of the embedded message. Traditionally, a hard decoder is used
with the assumption that the underlying modeling of the Discrete
Cosine Transform (DCT) coefficients does not appreciably change.
In case of an attack, the distribution of the image coefficients is
heavily altered. The distribution of the sufficient statistics at the
receiving end corresponding to the antipodal signals overlap and a
simple hard decoder fails to classify them properly. We are
considering message retrieval of antipodal signal as a binary
classification problem. Machine learning techniques like SVM is
used to retrieve the message, when certain specific class of attacks is
most probable. In order to validate SVM based decoding scheme, we
have taken Gaussian noise as a test case. We generate a data set using
125 images and 25 different keys. Polynomial kernel of SVM has
achieved 100 percent accuracy on test data.
Abstract: This paper describes a new algorithm of arrangement
in parallel, based on Odd-Even Mergesort, called division and
concurrent mixes. The main idea of the algorithm is to achieve that
each processor uses a sequential algorithm for ordering a part of the
vector, and after that, for making the processors work in pairs in
order to mix two of these sections ordered in a greater one, also
ordered; after several iterations, the vector will be completely
ordered. The paper describes the implementation of the new
algorithm on a Message Passing environment (such as MPI). Besides,
it compares the obtained experimental results with the quicksort
sequential algorithm and with the parallel implementations (also on
MPI) of the algorithms quicksort and bitonic sort. The comparison
has been realized in an 8 processors cluster under GNU/Linux which
is running on a unique PC processor.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: The automatic construction of large, high-resolution
image vistas (mosaics) is an active area of research in the fields of
photogrammetry [1,2], computer vision [1,4], medical image
processing [4], computer graphics [3] and biometrics [8]. Image
stitching is one of the possible options to get image mosaics. Vista
Creation in image processing is used to construct an image with a
large field of view than that could be obtained with a single
photograph. It refers to transforming and stitching multiple images
into a new aggregate image without any visible seam or distortion in
the overlapping areas. Vista creation process aligns two partial
images over each other and blends them together. Image mosaics
allow one to compensate for differences in viewing geometry. Thus
they can be used to simplify tasks by simulating the condition in
which the scene is viewed from a fixed position with single camera.
While obtaining partial images the geometric anomalies like rotation,
scaling are bound to happen. To nullify effect of rotation of partial
images on process of vista creation, we are proposing rotation
invariant vista creation algorithm in this paper. Rotation of partial
image parts in the proposed method of vista creation may introduce
some missing region in the vista. To correct this error, that is to fill
the missing region further we have used image inpainting method on
the created vista. This missing view regeneration method also
overcomes the problem of missing view [31] in vista due to cropping,
irregular boundaries of partial image parts and errors in digitization
[35]. The method of missing view regeneration generates the missing
view of vista using the information present in vista itself.
Abstract: In this paper a one-dimension Self Organizing Map
algorithm (SOM) to perform feature selection is presented. The
algorithm is based on a first classification of the input dataset on a
similarity space. From this classification for each class a set of
positive and negative features is computed. This set of features is
selected as result of the procedure. The procedure is evaluated on an
in-house dataset from a Knowledge Discovery from Text (KDT)
application and on a set of publicly available datasets used in
international feature selection competitions. These datasets come
from KDT applications, drug discovery as well as other applications.
The knowledge of the correct classification available for the training
and validation datasets is used to optimize the parameters for positive
and negative feature extractions. The process becomes feasible for
large and sparse datasets, as the ones obtained in KDT applications,
by using both compression techniques to store the similarity matrix
and speed up techniques of the Kohonen algorithm that take
advantage of the sparsity of the input matrix. These improvements
make it feasible, by using the grid, the application of the
methodology to massive datasets.
Abstract: When binary decision diagrams are formed from
uniformly distributed Monte Carlo data for a large number of
variables, the complexity of the decision diagrams exhibits a
predictable relationship to the number of variables and minterms. In
the present work, a neural network model has been used to analyze the
pattern of shortest path length for larger number of Monte Carlo data
points. The neural model shows a strong descriptive power for the
ISCAS benchmark data with an RMS error of 0.102 for the shortest
path length complexity. Therefore, the model can be considered as a
method of predicting path length complexities; this is expected to lead
to minimum time complexity of very large-scale integrated circuitries
and related computer-aided design tools that use binary decision
diagrams.
Abstract: Measuring the complexity of software has been an
insoluble problem in software engineering. Complexity measures can
be used to predict critical information about testability, reliability,
and maintainability of software systems from automatic analysis of
the source code. During the past few years, many complexity
measures have been invented based on the emerging Cognitive
Informatics discipline. These software complexity measures,
including cognitive functional size, lend themselves to the approach
of the total cognitive weights of basic control structures such as loops
and branches. This paper shows that the current existing calculation
method can generate different results that are algebraically
equivalence. However, analysis of the combinatorial meanings of this
calculation method shows significant flaw of the measure, which also
explains why it does not satisfy Weyuker's properties. Based on the
findings, improvement directions, such as measures fusion, and
cumulative variable counting scheme are suggested to enhance the
effectiveness of cognitive complexity measures.