Abstract: Phishing, or stealing of sensitive information on the
web, has dealt a major blow to Internet Security in recent times. Most
of the existing anti-phishing solutions fail to handle the fuzziness
involved in phish detection, thus leading to a large number of false
positives. This fuzziness is attributed to the use of highly flexible and
at the same time, highly ambiguous HTML language. We introduce a
new perspective against phishing, that tries to systematically prove,
whether a given page is phished or not, using the corresponding
original page as the basis of the comparison. It analyzes the layout of
the pages under consideration to determine the percentage distortion
between them, indicative of any form of malicious alteration. The
system design represents an intelligent system, employing dynamic
assessment which accurately identifies brand new phishing attacks
and will prove effective in reducing the number of false positives.
This framework could potentially be used as a knowledge base, in
educating the internet users against phishing.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.
Abstract: Corporate credit rating prediction using statistical and
artificial intelligence (AI) techniques has been one of the attractive
research topics in the literature. In recent years, multiclass
classification models such as artificial neural network (ANN) or
multiclass support vector machine (MSVM) have become a very
appealing machine learning approaches due to their good
performance. However, most of them have only focused on classifying
samples into nominal categories, thus the unique characteristic of the
credit rating - ordinality - has been seldom considered in their
approaches. This study proposes new types of ANN and MSVM
classifiers, which are named OMANN and OMSVM respectively.
OMANN and OMSVM are designed to extend binary ANN or SVM
classifiers by applying ordinal pairwise partitioning (OPP) strategy.
These models can handle ordinal multiple classes efficiently and
effectively. To validate the usefulness of these two models, we applied
them to the real-world bond rating case. We compared the results of
our models to those of conventional approaches. The experimental
results showed that our proposed models improve classification
accuracy in comparison to typical multiclass classification techniques
with the reduced computation resource.
Abstract: In this paper is presented a Geographic Information System (GIS) approach in order to qualify and monitor the broadband lines in efficient way. The methodology used for interpolation is the Delaunay Triangular Irregular Network (TIN). This method is applied for a case study in ISP Greece monitoring 120,000 broadband lines.
Abstract: This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.
Abstract: Independent component analysis (ICA) is a computational method for finding underlying signals or components from multivariate statistical data. The ICA method has been successfully applied in many fields, e.g. in vision research, brain imaging, geological signals and telecommunications. In this paper, we apply the ICA method to an analysis of mass spectra of oligomeric species emerged from aluminium sulphate. Mass spectra are typically complex, because they are linear combinations of spectra from different types of oligomeric species. The results show that ICA can decomposite the spectral components for useful information. This information is essential in developing coagulation phases of water treatment processes.
Abstract: In this paper a new definition of adjacency matrix in
the simple graphs is presented that is called fuzzy adjacency matrix,
so that elements of it are in the form of 0 and
n N
n
1 , ∈
that are
in the interval [0, 1], and then some charactristics of this matrix are
presented with the related examples . This form matrix has complete
of information of a graph.
Abstract: Transaction management is one of the most crucial requirements for enterprise application development which often require concurrent access to distributed data shared amongst multiple application / nodes. Transactions guarantee the consistency of data records when multiple users or processes perform concurrent operations. Existing Fault Tolerance Infrastructure for Mobile Agents (FTIMA) provides a fault tolerant behavior in distributed transactions and uses multi-agent system for distributed transaction and processing. In the existing FTIMA architecture, data flows through the network and contains personal, private or confidential information. In banking transactions a minor change in the transaction can cause a great loss to the user. In this paper we have modified FTIMA architecture to ensure that the user request reaches the destination server securely and without any change. We have used triple DES for encryption/ decryption and MD5 algorithm for validity of message.
Abstract: Whole genome duplication (WGD) increased the
number of yeast Saccharomyces cerevisiae chromosomes from 8 to
16. In spite of retention the number of chromosomes in the genome
of this organism after WGD to date, chromosomal rearrangement
events have caused an evolutionary distance between current genome
and its ancestor. Studies under evolutionary-based approaches on
eukaryotic genomes have shown that the rearrangement distance is an
approximable problem. In the case of S. cerevisiae, we describe that
rearrangement distance is accessible by using dedoubled adjacency
graph drawn for 55 large paired chromosomal regions originated
from WGD. Then, we provide a program extracted from a C program
database to draw a dedoubled genome adjacency graph for S.
cerevisiae. From a bioinformatical perspective, using the duplicated
blocks of current genome in S. cerevisiae, we infer that genomic
organization of eukaryotes has the potential to provide valuable
detailed information about their ancestrygenome.
Abstract: Power cables are vulnerable to failure due to aging or
defects that occur with the passage of time under continuous
operation and loading stresses. PD detection and characterization
provide information on the location, nature, form and extent of the
degradation. As a result, PD monitoring has become an important
part of condition based maintenance (CBM) program among power
utilities. Online partial discharge (PD) localization of defect sources
in power cable system is possible using the time of flight method.
The information regarding the time difference between the main and
reflected pulses and cable length can help in locating the partial
discharge source along the cable length. However, if the length of
the cable is not known and the defect source is located at the extreme
ends of the cable or in the middle of the cable, then double ended
measurement is required to indicate the location of PD source. Use of
multiple sensors can also help in discriminating the cable PD or local/
external PD. This paper presents the experience and results from
online partial discharge measurements conducted in the laboratory
and the challenges in partial discharge source localization.
Abstract: Visually impaired people find it extremely difficult to
acquire basic and vital information necessary for their living.
Therefore, they are at a very high risk of being socially excluded as a
result of poor access to information. In recent years, several attempts
have been made in improving the communication methods for
visually impaired people which involve tactile sensation such as
finger Braille, manual alphabets and the print on palm method and
several other electronic devices. But, there are some problems which
arise in such methods such as lack of privacy and lack of
compatibility to computer environment. This paper describes a low
cost Braille hand glove for blind people using slot sensors and
vibration motors with the help of which they can read and write emails,
text messages and read e-books. This glove allows the person
to type characters based on different Braille combination using six
slot sensors. The vibration in six different positions of the glove
which matches to the Braille code allows them to read characters.
Abstract: Physical urban form is recognized to be the media for
human transactions. It directly influences the travel demand of people
in a specific urban area and the amount of energy used for
transportation. Distorted, sprawling form often creates sustainability
problems in urban areas. It is declared in EU strategic planning
documents that compact urban form and mixed land use pattern must
be given the main focus to achieve better sustainability in urban
areas, but the methods to measure and compare these characteristics
are still not clear.
This paper presents the simple methods to measure the spatial
characteristics of urban form by analyzing the location and
distribution of objects in an urban environment. The extended CA
(cellular automata) model is used to simulate urban development
scenarios.
Abstract: This paper focuses on wormhole attacks detection in wireless sensor networks. The wormhole attack is particularly challenging to deal with since the adversary does not need to compromise any nodes and can use laptops or other wireless devices to send the packets on a low latency channel. This paper introduces an easy and effective method to detect and locate the wormholes: Since beacon nodes are assumed to know their coordinates, the straight line distance between each pair of them can be calculated and then compared with the corresponding hop distance, which in this paper equals hop counts × node-s transmission range R. Dramatic difference may emerge because of an existing wormhole. Our detection mechanism is based on this. The approximate location of the wormhole can also be derived in further steps based on this information. To the best of our knowledge, our method is much easier than other wormhole detecting schemes which also use beacon nodes, and to those have special requirements on each nodes (e.g., GPS receivers or tightly synchronized clocks or directional antennas), ours is more economical. Simulation results show that the algorithm is successful in detecting and locating wormholes when the density of beacon nodes reaches 0.008 per m2.
Abstract: This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.
Abstract: Bio-chips are used for experiments on genes and
contain various information such as genes, samples and so on. The
two-dimensional bio-chips, in which one axis represent genes and the
other represent samples, are widely being used these days. Instead of
experimenting with real genes which cost lots of money and much
time to get the results, bio-chips are being used for biological
experiments. And extracting data from the bio-chips with high
accuracy and finding out the patterns or useful information from such
data is very important. Bio-chip analysis systems extract data from
various kinds of bio-chips and mine the data in order to get useful
information. One of the commonly used methods to mine the data is
classification. The algorithm that is used to classify the data can be
various depending on the data types or number characteristics and so
on. Considering that bio-chip data is extremely large, an algorithm that
imitates the ecosystem such as the ant algorithm is suitable to use as an
algorithm for classification. This paper focuses on finding the
classification rules from the bio-chip data using the Ant Colony
algorithm which imitates the ecosystem. The developed system takes
in consideration the accuracy of the discovered rules when it applies it
to the bio-chip data in order to predict the classes.
Abstract: Knowledge Discovery of Databases (KDD) is the
process of extracting previously unknown but useful and significant
information from large massive volume of databases. Data Mining is
a stage in the entire process of KDD which applies an algorithm to
extract interesting patterns. Usually, such algorithms generate huge
volume of patterns. These patterns have to be evaluated by using
interestingness measures to reflect the user requirements.
Interestingness is defined in different ways, (i) Objective measures
(ii) Subjective measures. Objective measures such as support and
confidence extract meaningful patterns based on the structure of the
patterns, while subjective measures such as unexpectedness and
novelty reflect the user perspective. In this report, we try to brief the
more widely spread and successful subjective measures and propose
a new subjective measure of interestingness, i.e. shocking.
Abstract: Self-Excited Induction Generator (SEIG) builds up voltage while it enters in its magnetic saturation region. Due to non-linear magnetic characteristics, the performance analysis of SEIG involves cumbersome mathematical computations. The dependence of air-gap voltage on saturated magnetizing reactance can only be established at rated frequency by conducting a laboratory test commonly known as synchronous run test. But, there is no laboratory method to determine saturated magnetizing reactance and air-gap voltage of SEIG at varying speed, terminal capacitance and other loading conditions. For overall analysis of SEIG, prior information of magnetizing reactance, generated frequency and air-gap voltage is essentially required. Thus, analytical methods are the only alternative to determine these variables. Non-existence of direct mathematical relationship of these variables for different terminal conditions has forced the researchers to evolve new computational techniques. Artificial Neural Networks (ANNs) are very useful for solution of such complex problems, as they do not require any a priori information about the system. In this paper, an attempt is made to use cascaded neural networks to first determine the generated frequency and magnetizing reactance with varying terminal conditions and then air-gap voltage of SEIG. The results obtained from the ANN model are used to evaluate the overall performance of SEIG and are found to be in good agreement with experimental results. Hence, it is concluded that analysis of SEIG can be carried out effectively using ANNs.
Abstract: Recently, an enhanced hexagon-based search (EHS)
algorithm was proposed to speedup the original hexagon-based search
(HS) by exploiting the group-distortion information of some evaluated
points. In this paper, a second version of the EHS is proposed with a
new point-oriented inner search technique which can further speedup
the HS in both large and small motion environments. Experimental
results show that the enhanced hexagon-based search version-2
(EHS2) is faster than the HS up to 34% with negligible PSNR
degradation.
Abstract: Web intelligence, if made personal, can fuel the process of building communications around the interests and preferences of each individual customer or prospect, by providing specific behavioral insights about each individual. To become fully efficient, Web intelligence must reach a stage of a high-level maturity, passing throughout a process that involves five steps: (1) Web site analysis; (2) Web site and advertising optimization; (3) Segment targeting; (4) Interactive marketing (online only); and (5) Interactive marketing (online and offline). Discussing these steps in detail, the paper uncovers the real gold mine that is personal-level Web intelligence.
Abstract: In this article, we introduce a new approach for
analyzing UML designs to detect the inconsistencies between
multiple state diagrams and sequence diagrams. The Super State
Analysis (SSA) identifies the inconsistencies in super states, single
step transitions, and sequences. Because SSA considers multiple
UML state diagrams, it discovers inconsistencies that cannot be
discovered when considering only a single UML state diagram. We
have introduced a transition set that captures relationship information
that is not specifiable in UML diagrams. The SSA model uses the
transition set to link transitions of multiple state diagrams together.
The analysis generates three different sets automatically. These sets
are compared to the provided sets to detect the inconsistencies. SSA
identifies five types of inconsistencies: impossible super states,
unreachable super states, illegal transitions, missing transitions, and
illegal sequences.