An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

Improved Closed Set Text-Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks

A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.

Corporate Credit Rating using Multiclass Classification Models with order Information

Corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has been one of the attractive research topics in the literature. In recent years, multiclass classification models such as artificial neural network (ANN) or multiclass support vector machine (MSVM) have become a very appealing machine learning approaches due to their good performance. However, most of them have only focused on classifying samples into nominal categories, thus the unique characteristic of the credit rating - ordinality - has been seldom considered in their approaches. This study proposes new types of ANN and MSVM classifiers, which are named OMANN and OMSVM respectively. OMANN and OMSVM are designed to extend binary ANN or SVM classifiers by applying ordinal pairwise partitioning (OPP) strategy. These models can handle ordinal multiple classes efficiently and effectively. To validate the usefulness of these two models, we applied them to the real-world bond rating case. We compared the results of our models to those of conventional approaches. The experimental results showed that our proposed models improve classification accuracy in comparison to typical multiclass classification techniques with the reduced computation resource.

Qualification and Provisioning of xDSL Broadband Lines using a GIS Approach

In this paper is presented a Geographic Information System (GIS) approach in order to qualify and monitor the broadband lines in efficient way. The methodology used for interpolation is the Delaunay Triangular Irregular Network (TIN). This method is applied for a case study in ISP Greece monitoring 120,000 broadband lines.

An Effective Framework for Chinese Syntactic Parsing

This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.

Independent Component Analysis to Mass Spectra of Aluminium Sulphate

Independent component analysis (ICA) is a computational method for finding underlying signals or components from multivariate statistical data. The ICA method has been successfully applied in many fields, e.g. in vision research, brain imaging, geological signals and telecommunications. In this paper, we apply the ICA method to an analysis of mass spectra of oligomeric species emerged from aluminium sulphate. Mass spectra are typically complex, because they are linear combinations of spectra from different types of oligomeric species. The results show that ICA can decomposite the spectral components for useful information. This information is essential in developing coagulation phases of water treatment processes.

Fuzzy Adjacency Matrix in Graphs

In this paper a new definition of adjacency matrix in the simple graphs is presented that is called fuzzy adjacency matrix, so that elements of it are in the form of 0 and n N n 1 , ∈ that are in the interval [0, 1], and then some charactristics of this matrix are presented with the related examples . This form matrix has complete of information of a graph.

Ensuring Data Security and Consistency in FTIMA - A Fault Tolerant Infrastructure for Mobile Agents

Transaction management is one of the most crucial requirements for enterprise application development which often require concurrent access to distributed data shared amongst multiple application / nodes. Transactions guarantee the consistency of data records when multiple users or processes perform concurrent operations. Existing Fault Tolerance Infrastructure for Mobile Agents (FTIMA) provides a fault tolerant behavior in distributed transactions and uses multi-agent system for distributed transaction and processing. In the existing FTIMA architecture, data flows through the network and contains personal, private or confidential information. In banking transactions a minor change in the transaction can cause a great loss to the user. In this paper we have modified FTIMA architecture to ensure that the user request reaches the destination server securely and without any change. We have used triple DES for encryption/ decryption and MD5 algorithm for validity of message.

Evolutionary Distance in the Yeast Genome

Whole genome duplication (WGD) increased the number of yeast Saccharomyces cerevisiae chromosomes from 8 to 16. In spite of retention the number of chromosomes in the genome of this organism after WGD to date, chromosomal rearrangement events have caused an evolutionary distance between current genome and its ancestor. Studies under evolutionary-based approaches on eukaryotic genomes have shown that the rearrangement distance is an approximable problem. In the case of S. cerevisiae, we describe that rearrangement distance is accessible by using dedoubled adjacency graph drawn for 55 large paired chromosomal regions originated from WGD. Then, we provide a program extracted from a C program database to draw a dedoubled genome adjacency graph for S. cerevisiae. From a bioinformatical perspective, using the duplicated blocks of current genome in S. cerevisiae, we infer that genomic organization of eukaryotes has the potential to provide valuable detailed information about their ancestrygenome.

Online Partial Discharge Source Localization and Characterization Using Non-Conventional Method

Power cables are vulnerable to failure due to aging or defects that occur with the passage of time under continuous operation and loading stresses. PD detection and characterization provide information on the location, nature, form and extent of the degradation. As a result, PD monitoring has become an important part of condition based maintenance (CBM) program among power utilities. Online partial discharge (PD) localization of defect sources in power cable system is possible using the time of flight method. The information regarding the time difference between the main and reflected pulses and cable length can help in locating the partial discharge source along the cable length. However, if the length of the cable is not known and the defect source is located at the extreme ends of the cable or in the middle of the cable, then double ended measurement is required to indicate the location of PD source. Use of multiple sensors can also help in discriminating the cable PD or local/ external PD. This paper presents the experience and results from online partial discharge measurements conducted in the laboratory and the challenges in partial discharge source localization.

Low Cost Real-Time Communication Braille Hand-Glove for Visually Impaired Using Slot Sensors and Vibration Motors

Visually impaired people find it extremely difficult to acquire basic and vital information necessary for their living. Therefore, they are at a very high risk of being socially excluded as a result of poor access to information. In recent years, several attempts have been made in improving the communication methods for visually impaired people which involve tactile sensation such as finger Braille, manual alphabets and the print on palm method and several other electronic devices. But, there are some problems which arise in such methods such as lack of privacy and lack of compatibility to computer environment. This paper describes a low cost Braille hand glove for blind people using slot sensors and vibration motors with the help of which they can read and write emails, text messages and read e-books. This glove allows the person to type characters based on different Braille combination using six slot sensors. The vibration in six different positions of the glove which matches to the Braille code allows them to read characters.

Measuring of Urban Sustainability in Town Planners Practice

Physical urban form is recognized to be the media for human transactions. It directly influences the travel demand of people in a specific urban area and the amount of energy used for transportation. Distorted, sprawling form often creates sustainability problems in urban areas. It is declared in EU strategic planning documents that compact urban form and mixed land use pattern must be given the main focus to achieve better sustainability in urban areas, but the methods to measure and compare these characteristics are still not clear. This paper presents the simple methods to measure the spatial characteristics of urban form by analyzing the location and distribution of objects in an urban environment. The extended CA (cellular automata) model is used to simulate urban development scenarios.

Detecting and Locating Wormhole Attacks in Wireless Sensor Networks Using Beacon Nodes

This paper focuses on wormhole attacks detection in wireless sensor networks. The wormhole attack is particularly challenging to deal with since the adversary does not need to compromise any nodes and can use laptops or other wireless devices to send the packets on a low latency channel. This paper introduces an easy and effective method to detect and locate the wormholes: Since beacon nodes are assumed to know their coordinates, the straight line distance between each pair of them can be calculated and then compared with the corresponding hop distance, which in this paper equals hop counts × node-s transmission range R. Dramatic difference may emerge because of an existing wormhole. Our detection mechanism is based on this. The approximate location of the wormhole can also be derived in further steps based on this information. To the best of our knowledge, our method is much easier than other wormhole detecting schemes which also use beacon nodes, and to those have special requirements on each nodes (e.g., GPS receivers or tightly synchronized clocks or directional antennas), ours is more economical. Simulation results show that the algorithm is successful in detecting and locating wormholes when the density of beacon nodes reaches 0.008 per m2.

Mining Association Rules from Unstructured Documents

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.

Classifying Bio-Chip Data using an Ant Colony System Algorithm

Bio-chips are used for experiments on genes and contain various information such as genes, samples and so on. The two-dimensional bio-chips, in which one axis represent genes and the other represent samples, are widely being used these days. Instead of experimenting with real genes which cost lots of money and much time to get the results, bio-chips are being used for biological experiments. And extracting data from the bio-chips with high accuracy and finding out the patterns or useful information from such data is very important. Bio-chip analysis systems extract data from various kinds of bio-chips and mine the data in order to get useful information. One of the commonly used methods to mine the data is classification. The algorithm that is used to classify the data can be various depending on the data types or number characteristics and so on. Considering that bio-chip data is extremely large, an algorithm that imitates the ecosystem such as the ant algorithm is suitable to use as an algorithm for classification. This paper focuses on finding the classification rules from the bio-chip data using the Ant Colony algorithm which imitates the ecosystem. The developed system takes in consideration the accuracy of the discovered rules when it applies it to the bio-chip data in order to predict the classes.

Development of Subjective Measures of Interestingness: From Unexpectedness to Shocking

Knowledge Discovery of Databases (KDD) is the process of extracting previously unknown but useful and significant information from large massive volume of databases. Data Mining is a stage in the entire process of KDD which applies an algorithm to extract interesting patterns. Usually, such algorithms generate huge volume of patterns. These patterns have to be evaluated by using interestingness measures to reflect the user requirements. Interestingness is defined in different ways, (i) Objective measures (ii) Subjective measures. Objective measures such as support and confidence extract meaningful patterns based on the structure of the patterns, while subjective measures such as unexpectedness and novelty reflect the user perspective. In this report, we try to brief the more widely spread and successful subjective measures and propose a new subjective measure of interestingness, i.e. shocking.

Cascaded ANN for Evaluation of Frequency and Air-gap Voltage of Self-Excited Induction Generator

Self-Excited Induction Generator (SEIG) builds up voltage while it enters in its magnetic saturation region. Due to non-linear magnetic characteristics, the performance analysis of SEIG involves cumbersome mathematical computations. The dependence of air-gap voltage on saturated magnetizing reactance can only be established at rated frequency by conducting a laboratory test commonly known as synchronous run test. But, there is no laboratory method to determine saturated magnetizing reactance and air-gap voltage of SEIG at varying speed, terminal capacitance and other loading conditions. For overall analysis of SEIG, prior information of magnetizing reactance, generated frequency and air-gap voltage is essentially required. Thus, analytical methods are the only alternative to determine these variables. Non-existence of direct mathematical relationship of these variables for different terminal conditions has forced the researchers to evolve new computational techniques. Artificial Neural Networks (ANNs) are very useful for solution of such complex problems, as they do not require any a priori information about the system. In this paper, an attempt is made to use cascaded neural networks to first determine the generated frequency and magnetizing reactance with varying terminal conditions and then air-gap voltage of SEIG. The results obtained from the ANN model are used to evaluate the overall performance of SEIG and are found to be in good agreement with experimental results. Hence, it is concluded that analysis of SEIG can be carried out effectively using ANNs.

New Enhanced Hexagon-Based Search Using Point-Oriented Inner Search for Fast Block Motion Estimation

Recently, an enhanced hexagon-based search (EHS) algorithm was proposed to speedup the original hexagon-based search (HS) by exploiting the group-distortion information of some evaluated points. In this paper, a second version of the EHS is proposed with a new point-oriented inner search technique which can further speedup the HS in both large and small motion environments. Experimental results show that the enhanced hexagon-based search version-2 (EHS2) is faster than the HS up to 34% with negligible PSNR degradation.

The Path to Web Intelligence Maturity

Web intelligence, if made personal, can fuel the process of building communications around the interests and preferences of each individual customer or prospect, by providing specific behavioral insights about each individual. To become fully efficient, Web intelligence must reach a stage of a high-level maturity, passing throughout a process that involves five steps: (1) Web site analysis; (2) Web site and advertising optimization; (3) Segment targeting; (4) Interactive marketing (online only); and (5) Interactive marketing (online and offline). Discussing these steps in detail, the paper uncovers the real gold mine that is personal-level Web intelligence.

Inconsistency Discovery in Multiple State Diagrams

In this article, we introduce a new approach for analyzing UML designs to detect the inconsistencies between multiple state diagrams and sequence diagrams. The Super State Analysis (SSA) identifies the inconsistencies in super states, single step transitions, and sequences. Because SSA considers multiple UML state diagrams, it discovers inconsistencies that cannot be discovered when considering only a single UML state diagram. We have introduced a transition set that captures relationship information that is not specifiable in UML diagrams. The SSA model uses the transition set to link transitions of multiple state diagrams together. The analysis generates three different sets automatically. These sets are compared to the provided sets to detect the inconsistencies. SSA identifies five types of inconsistencies: impossible super states, unreachable super states, illegal transitions, missing transitions, and illegal sequences.