Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Evaluation of Graph-based Analysis for Forest Fire Detections

Spatial outliers in remotely sensed imageries represent observed quantities showing unusual values compared to their neighbor pixel values. There have been various methods to detect the spatial outliers based on spatial autocorrelations in statistics and data mining. These methods may be applied in detecting forest fire pixels in the MODIS imageries from NASA-s AQUA satellite. This is because the forest fire detection can be referred to as finding spatial outliers using spatial variation of brightness temperature. This point is what distinguishes our approach from the traditional fire detection methods. In this paper, we propose a graph-based forest fire detection algorithm which is based on spatial outlier detection methods, and test the proposed algorithm to evaluate its applicability. For this the ordinary scatter plot and Moran-s scatter plot were used. In order to evaluate the proposed algorithm, the results were compared with the MODIS fire product provided by the NASA MODIS Science Team, which showed the possibility of the proposed algorithm in detecting the fire pixels.

String Searching in Dispersed Files using MDS Convolutional Codes

In this paper, we propose use of convolutional codes for file dispersal. The proposed method is comparable in complexity to the information Dispersal Algorithm proposed by M.Rabin and for particular choices of (non-binary) convolutional codes, is almost as efficient as that algorithm in terms of controlling expansion in the total storage. Further, our proposed dispersal method allows string search.

Combinatorial Optimisation of Worm Propagationon an Unknown Network

Worm propagation profiles have significantly changed since 2003-2004: sudden world outbreaks like Blaster or Slammer have progressively disappeared and slower but stealthier worms appeared since, most of them for botnets dissemination. Decreased worm virulence results in more difficult detection. In this paper, we describe a stealth worm propagation model which has been extensively simulated and analysed on a huge virtual network. The main features of this model is its ability to infect any Internet-like network in a few seconds, whatever may be its size while greatly limiting the reinfection attempt overhead of already infected hosts. The main simulation results shows that the combinatorial topology of routing may have a huge impact on the worm propagation and thus some servers play a more essential and significant role than others. The real-time capability to identify them may be essential to greatly hinder worm propagation.

Optimal All-to-All Personalized Communication in All-Port Tori

All-to-all personalized communication, also known as complete exchange, is one of the most dense communication patterns in parallel computing. In this paper, we propose new indirect algorithms for complete exchange on all-port ring and torus. The new algorithms fully utilize all communication links and transmit messages along shortest paths to completely achieve the theoretical lower bounds on message transmission, which have not be achieved among other existing indirect algorithms. For 2D r × c ( r % c ) all-port torus, the algorithm has time complexities of optimal transmission cost and O(c) message startup cost. In addition, the proposed algorithms accommodate non-power-of-two tori where the number of nodes in each dimension needs not be power-of-two or square. Finally, the algorithms are conceptually simple and symmetrical for every message and every node so that they can be easily implemented and achieve the optimum in practice.

Dynamic Authenticated Secure Group Communication

Providing authentication for the messages exchanged between group members in addition to confidentiality is an important issue in Secure Group communication. We develop a protocol for Secure Authentic Communication where we address authentication for the group communication scheme proposed by Blundo et al. which only provides confidentiality. Authentication scheme used is a multiparty authentication scheme which allows all the users in the system to send and receive messages simultaneously. Our scheme is secure against colluding malicious parties numbering fewer than k.

Optical Road Monitoring of the Future Smart Roads – Preliminary Results

It has been shown that in most accidents the driver is responsible due to being distracted or misjudging the situation. In order to solve such problems research has been dedicated to developing driver assistance systems that are able to monitor the traffic situation around the vehicle. This paper presents methods for recognizing several circumstances on a road. The methods use both the in-vehicle warning systems and the roadside infrastructure. Preliminary evaluation results for fog and ice-on-road detection are presented. The ice detection results are based on data recorded in a test track dedicated to tyre friction testing. The achieved results anticipate that ice detection could work at a performance of 70% detection with the right setup, which is a good foundation for implementation. However, the full benefit of the presented cooperative system is achieved by fusing the outputs of multiple data sources, which is the key point of discussion behind this publication.

Evolutionary Approach for Automated Discovery of Censored Production Rules

In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.

WDM-Based Storage Area Network (SAN) for Disaster Recovery Operations

This paper proposes a Wavelength Division Multiplexing (WDM) technology based Storage Area Network (SAN) for all type of Disaster recovery operation. It considers recovery when all paths failure in the network as well as the main SAN site failure also the all backup sites failure by the effect of natural disasters such as earthquakes, fires and floods, power outage, and terrorist attacks, as initially SAN were designed to work within distance limited environments[2]. Paper also presents a NEW PATH algorithm when path failure occurs. The simulation result and analysis is presented for the proposed architecture with performance consideration.

The SAFRS System : A Case-Based Reasoning Training Tool for Capturing and Re-Using Knowledge

The paper aims to specify and build a system, a learning support in radiology-senology (breast radiology) dedicated to help assist junior radiologists-senologists in their radiologysenology- related activity based on experience of expert radiologistssenologists. This system is named SAFRS (i.e. system supporting the training of radiologists-senologists). It is based on the exploitation of radiologic-senologic images (primarily mammograms but also echographic images or MRI) and their related clinical files. The aim of such a system is to help breast cancer screening in education. In order to acquire this expert radiologist-senologist knowledge, we have used the CBR (case-based reasoning) approach. The SAFRS system will promote the evolution of teaching in radiology-senology by offering the “junior radiologist" trainees an advanced pedagogical product. It will permit a strengthening of knowledge together with a very elaborate presentation of results. At last, the know-how will derive from all these factors.

Obfuscation Studio Executive

New software protection product called “Obfuscation Studio" is presented in the paper. Several obfuscating modules that are already implemented are described. Some theoretical data is presented, that shows the potency and effectiveness of described obfuscation methods. “Obfuscation Studio" is being implemented for protecting programs written for .NET platform, but the described methods can also be interesting for other applications.

Advanced Polymorphic Techniques

Nowadays viruses use polymorphic techniques to mutate their code on each replication, thus evading detection by antiviruses. However detection by emulation can defeat simple polymorphism: thus metamorphic techniques are used which thoroughly change the viral code, even after decryption. We briefly detail this evolution of virus protection techniques against detection and then study the METAPHOR virus, today's most advanced metamorphic virus.

Data Transformation Services (DTS): Creating Data Mart by Consolidating Multi-Source Enterprise Operational Data

Trends in business intelligence, e-commerce and remote access make it necessary and practical to store data in different ways on multiple systems with different operating systems. As business evolve and grow, they require efficient computerized solution to perform data update and to access data from diverse enterprise business applications. The objective of this paper is to demonstrate the capability of DTS [1] as a database solution for automatic data transfer and update in solving business problem. This DTS package is developed for the sales of variety of plants and eventually expanded into commercial supply and landscaping business. Dimension data modeling is used in DTS package to extract, transform and load data from heterogeneous database systems such as MySQL, Microsoft Access and Oracle that consolidates into a Data Mart residing in SQL Server. Hence, the data transfer from various databases is scheduled to run automatically every quarter of the year to review the efficient sales analysis. Therefore, DTS is absolutely an attractive solution for automatic data transfer and update which meeting today-s business needs.

Object-Oriented Programming Strategies in C# for Power Conscious System

Low power consumption is a major constraint for battery-powered system like computer notebook or PDA. In the past, specialists usually designed both specific optimized equipments and codes to relief this concern. Doing like this could work for quite a long time, however, in this era, there is another significant restraint, the time to market. To be able to serve along the power constraint while can launch products in shorter production period, objectoriented programming (OOP) has stepped in to this field. Though everyone knows that OOP has quite much more overhead than assembly and procedural languages, development trend still heads to this new world, which contradicts with the target of low power consumption. Most of the prior power related software researches reported that OOP consumed much resource, however, as industry had to accept it due to business reasons, up to now, no papers yet had mentioned about how to choose the best OOP practice in this power limited boundary. This article is the pioneer that tries to specify and propose the optimized strategy in writing OOP software under energy concerned environment, based on quantitative real results. The language chosen for studying is C# based on .NET Framework 2.0 which is one of the trendy OOP development environments. The recommendation gotten from this research would be a good roadmap that can help developers in coding that well balances between time to market and time of battery.

Computing the Loop Bound in Iterative Data Flow Graphs Using Natural Token Flow

Signal processing applications which are iterative in nature are best represented by data flow graphs (DFG). In these applications, the maximum sampling frequency is dependent on the topology of the DFG, the cyclic dependencies in particular. The determination of the iteration bound, which is the reciprocal of the maximum sampling frequency, is critical in the process of hardware implementation of signal processing applications. In this paper, a novel technique to compute the iteration bound is proposed. This technique is different from all previously proposed techniques, in the sense that it is based on the natural flow of tokens into the DFG rather than the topology of the graph. The proposed algorithm has lower run-time complexity than all known algorithms. The performance of the proposed algorithm is illustrated through analytical analysis of the time complexity, as well as through simulation of some benchmark problems.

A Two-Step Approach for Tree-structured XPath Query Reduction

XML data consists of a very flexible tree-structure which makes it difficult to support the storing and retrieving of XML data. The node numbering scheme is one of the most popular approaches to store XML in relational databases. Together with the node numbering storage scheme, structural joins can be used to efficiently process the hierarchical relationships in XML. However, in order to process a tree-structured XPath query containing several hierarchical relationships and conditional sentences on XML data, many structural joins need to be carried out, which results in a high query execution cost. This paper introduces mechanisms to reduce the XPath queries including branch nodes into a much more efficient form with less numbers of structural joins. A two step approach is proposed. The first step merges duplicate nodes in the tree-structured query and the second step divides the query into sub-queries, shortens the paths and then merges the sub-queries back together. The proposed approach can highly contribute to the efficient execution of XML queries. Experimental results show that the proposed scheme can reduce the query execution cost by up to an order of magnitude of the original execution cost.

Fusion Filters Weighted by Scalars and Matrices for Linear Systems

An optimal mean-square fusion formulas with scalar and matrix weights are presented. The relationship between them is established. The fusion formulas are compared on the continuous-time filtering problem. The basic differential equation for cross-covariance of the local errors being the key quantity for distributed fusion is derived. It is shown that the fusion filters are effective for multi-sensor systems containing different types of sensors. An example demonstrating the reasonable good accuracy of the proposed filters is given.

A Rule-based Approach for Anomaly Detection in Subscriber Usage Pattern

In this report we present a rule-based approach to detect anomalous telephone calls. The method described here uses subscriber usage CDR (call detail record) data sampled over two observation periods: study period and test period. The study period contains call records of customers- non-anomalous behaviour. Customers are first grouped according to their similar usage behaviour (like, average number of local calls per week, etc). For customers in each group, we develop a probabilistic model to describe their usage. Next, we use maximum likelihood estimation (MLE) to estimate the parameters of the calling behaviour. Then we determine thresholds by calculating acceptable change within a group. MLE is used on the data in the test period to estimate the parameters of the calling behaviour. These parameters are compared against thresholds. Any deviation beyond the threshold is used to raise an alarm. This method has the advantage of identifying local anomalies as compared to techniques which identify global anomalies. The method is tested for 90 days of study data and 10 days of test data of telecom customers. For medium to large deviations in the data in test window, the method is able to identify 90% of anomalous usage with less than 1% false alarm rate.

A Robust Wavelet-Based Watermarking Algorithm Using Edge Detection

In this paper, a robust watermarking algorithm using the wavelet transform and edge detection is presented. The efficiency of an image watermarking technique depends on the preservation of visually significant information. This is attained by embedding the watermark transparently with the maximum possible strength. The watermark embedding process is carried over the subband coefficients that lie on edges, where distortions are less noticeable, with a subband level dependent strength. Also, the watermark is embedded to selected coefficients around edges, using a different scale factor for watermark strength, that are captured by a morphological dilation operation. The experimental evaluation of the proposed method shows very good results in terms of robustness and transparency to various attacks such as median filtering, Gaussian noise, JPEG compression and geometrical transformations.