Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Multigrid Bilateral Filter

It has proved that nonlinear diffusion and bilateral filtering (BF) have a closed connection. Early effort and contribution are to find a generalized representation to link them by using adaptive filtering. In this paper a new further relationship between nonlinear diffusion and bilateral filtering is explored which pays more attention to numerical calculus. We give a fresh idea that bilateral filtering can be accelerated by multigrid (MG) scheme which likes the nonlinear diffusion, and show that a bilateral filtering process with large kernel size can be approximated by a nonlinear diffusion process based on full multigrid (FMG) scheme.

Evolutionary Feature Selection for Text Documents using the SVM

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, we present three feature selection methods: Information Gain, Support Vector Machine feature selection called (SVM_FS) and Genetic Algorithm with SVM (called GA_SVM). We show that the best results were obtained with GA_SVM method for a relatively small dimension of the feature vector.

An UML Statechart Diagram-Based MM-Path Generation Approach for Object-Oriented Integration Testing

MM-Path, an acronym for Method/Message Path, describes the dynamic interactions between methods in object-oriented systems. This paper discusses the classifications of MM-Path, based on the characteristics of object-oriented software. We categorize it according to the generation reasons, the effect scope and the composition of MM-Path. A formalized representation of MM-Path is also proposed, which has considered the influence of state on response method sequences of messages. .Moreover, an automatic MM-Path generation approach based on UML Statechart diagram has been presented, and the difficulties in identifying and generating MM-Path can be solved. . As a result, it provides a solid foundation for further research on test cases generation based on MM-Path.

Effective Collaboration in Product Development via a Common Sharable Ontology

To achieve competitive advantage nowadays, most of the industrial companies are considering that success is sustained to great product development. That is to manage the product throughout its entire lifetime ranging from design, manufacture, operation and destruction. Achieving this goal requires a tight collaboration between partners from a wide variety of domains, resulting in various product data types and formats, as well as different software tools. So far, the lack of a meaningful unified representation for product data semantics has slowed down efficient product development. This paper proposes an ontology based approach to enable such semantic interoperability. Generic and extendible product ontology is described, gathering main concepts pertaining to the mechanical field and the relations that hold among them. The ontology is not exhaustive; nevertheless, it shows that such a unified representation is possible and easily exploitable. This is illustrated thru a case study with an example product and some semantic requests to which the ontology responds quite easily. The study proves the efficiency of ontologies as a support to product data exchange and information sharing, especially in product development environments where collaboration is not just a choice but a mandatory prerequisite.

Leadership Branding for Sustainable Customer Engagement

The purpose of this paper is to examine the inter relationships among various leadership branding constructs of entrepreneurs in small and medium sized enterprises (SMEs). We employ a quantitative structural equation modeling through a new leadership branding engagement model comprises constructs of leader-s or entrepreneur-s personality, branding practice and customer engagement. The results confirm that there are significant relationships between the three constructs and the major fit indices indicate that the data fits the proposed model. The findings provide insights and fill in the literature gaps on statistically validated representation of leadership branding for SMEs across new economic regions of Malaysia that may implicate other economic zones with similar situations. This study extends the establishment of a leadership branding engagement model with a new mechanism of using leaders- personality as a predictor to branding practice and customer engagement performance.

Integrating Low and High Level Object Recognition Steps by Probabilistic Networks

In pattern recognition applications the low level segmentation and the high level object recognition are generally considered as two separate steps. The paper presents a method that bridges the gap between the low and the high level object recognition. It is based on a Bayesian network representation and network propagation algorithm. At the low level it uses hierarchical structure of quadratic spline wavelet image bases. The method is demonstrated for a simple circuit diagram component identification problem.

Relational Representation in XCSF

Generalization is one of the most challenging issues of Learning Classifier Systems. This feature depends on the representation method which the system used. Considering the proposed representation schemes for Learning Classifier System, it can be concluded that many of them are designed to describe the shape of the region which the environmental states belong and the other relations of the environmental state with that region was ignored. In this paper, we propose a new representation scheme which is designed to show various relationships between the environmental state and the region that is specified with a particular classifier.

A Comparison of Exact and Heuristic Approaches to Capital Budgeting

This paper summarizes and compares approaches to solving the knapsack problem and its known application in capital budgeting. The first approach uses deterministic methods and can be applied to small-size tasks with a single constraint. We can also apply commercial software systems such as the GAMS modelling system. However, because of NP-completeness of the problem, more complex problem instances must be solved by means of heuristic techniques to achieve an approximation of the exact solution in a reasonable amount of time. We show the problem representation and parameter settings for a genetic algorithm framework.

New Recursive Representations for the Favard Constants with Application to the Summation of Series

In this study integral form and new recursive formulas for Favard constants and some connected with them numeric and Fourier series are obtained. The method is based on preliminary integration of Fourier series which allows for establishing finite recursive representations for the summation. It is shown that the derived recursive representations are numerically more effective than known representations of the considered objects.

A Fuzzy MCDM Approach for Health-Care Waste Management

The management of the health-care wastes is one of the most important problems in Istanbul, a city with more than 12 million inhabitants, as it is in most of the developing countries. Negligence in appropriate treatment and final disposal of the healthcare wastes can lead to adverse impacts to public health and to the environment. This paper employs a fuzzy multi-criteria group decision making approach, which is based on the principles of fusion of fuzzy information, 2-tuple linguistic representation model, and technique for order preference by similarity to ideal solution (TOPSIS), to evaluate health-care waste (HCW) treatment alternatives for Istanbul. The evaluation criteria are determined employing nominal group technique (NGT), which is a method of systematically developing a consensus of group opinion. The employed method is apt to manage information assessed using multigranularity linguistic information in a decision making problem with multiple information sources. The decision making framework employs ordered weighted averaging (OWA) operator that encompasses several operators as the aggregation operator since it can implement different aggregation rules by changing the order weights. The aggregation process is based on the unification of information by means of fuzzy sets on a basic linguistic term set (BLTS). Then, the unified information is transformed into linguistic 2-tuples in a way to rectify the problem of loss information of other fuzzy linguistic approaches.

Dynamics in Tangible Chemical Reactions

Spatial understanding and the understanding of dynamic change in the spatial structure of molecules during a reaction is essential for designing new molecules. Knowing the physical processes in the reactions helps to speed up the designing process. To support the designer with the correct representation of the designed molecule as well as showing the dynamic behavior of the whole reacting system is the goal of our application. Our system shows the spatial deformation of the molecules at every time interval by minimizing the energy level of the molecules. The position and orientation of the molecules can be intuitively controlled by manipulating objects of the real world using Augmented Reality techniques. Our approach has the potential to speed up the design of new molecules and help students to understand the chemical processes better.

An Efficient Algorithm for Computing all Program Forward Static Slices

Program slicing is the task of finding all statements in a program that directly or indirectly influence the value of a variable occurrence. The set of statements that can affect the value of a variable at some point in a program is called a program backward slice. In several software engineering applications, such as program debugging and measuring program cohesion and parallelism, several slices are computed at different program points. The existing algorithms for computing program slices are introduced to compute a slice at a program point. In these algorithms, the program, or the model that represents the program, is traversed completely or partially once. To compute more than one slice, the same algorithm is applied for every point of interest in the program. Thus, the same program, or program representation, is traversed several times. In this paper, an algorithm is introduced to compute all forward static slices of a computer program by traversing the program representation graph once. Therefore, the introduced algorithm is useful for software engineering applications that require computing program slices at different points of a program. The program representation graph used in this paper is called Program Dependence Graph (PDG).

Application of a Similarity Measure for Graphs to Web-based Document Structures

Due to the tremendous amount of information provided by the World Wide Web (WWW) developing methods for mining the structure of web-based documents is of considerable interest. In this paper we present a similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as linear integer strings, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments for solving a novel and challenging problem: Measuring the structural similarity of generalized trees. In other words: We first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem for developing a efficient graph similarity measure. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based document structures.

Analysis of Genotype Size for an Evolvable Hardware System

The evolution of logic circuits, which falls under the heading of evolvable hardware, is carried out by evolutionary algorithms. These algorithms are able to automatically configure reconfigurable devices. One of main difficulties in developing evolvable hardware with the ability to design functional electrical circuits is to choose the most favourable EA features such as fitness function, chromosome representations, population size, genetic operators and individual selection. Until now several researchers from the evolvable hardware community have used and tuned these parameters and various rules on how to select the value of a particular parameter have been proposed. However, to date, no one has presented a study regarding the size of the chromosome representation (circuit layout) to be used as a platform for the evolution in order to increase the evolvability, reduce the number of generations and optimize the digital logic circuits through reducing the number of logic gates. In this paper this topic has been thoroughly investigated and the optimal parameters for these EA features have been proposed. The evolution of logic circuits has been carried out by an extrinsic evolvable hardware system which uses (1+λ) evolution strategy as the core of the evolution.

A Method under Uncertain Information for the Selection of Students in Interdisciplinary Studies

We present a method for the selection of students in interdisciplinary studies based on the hybrid averaging operator. We assume that the available information given in the problem is uncertain so it is necessary to use interval numbers. Therefore, we suggest a new type of hybrid aggregation called uncertain induced generalized hybrid averaging (UIGHA) operator. It is an aggregation operator that considers the weighted average (WA) and the ordered weighted averaging (OWA) operator in the same formulation. Therefore, we are able to consider the degree of optimism of the decision maker and grades of importance in the same approach. By using interval numbers, we are able to represent the information considering the best and worst possible results so the decision maker gets a more complete view of the decision problem. We develop an illustrative example of the proposed scheme in the selection of students in interdisciplinary studies. We see that with the use of the UIGHA operator we get a more complete representation of the selection problem. Then, the decision maker is able to consider a wide range of alternatives depending on his interests. We also show other potential applications that could be used by using the UIGHA operator in educational problems about selection of different types of resources such as students, professors, etc.

A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.

Virtual Reality for Mutual Understanding in Landscape Planning

This paper argues that fostering mutual understanding in landscape planning is as much about the planners educating stakeholder groups as the stakeholders educating the planners. In other words it is an epistemological agreement as to the meaning and nature of place, especially where an effort is made to go beyond the quantitative aspects, which can be achieved by the phenomenological experience of the Virtual Reality (VR) environment. This education needs to be a bi-directional process in which distance can be both temporal as well as spatial separation of participants, that there needs to be a common framework of understanding in which neither 'side' is disadvantaged during the process of information exchange and it follows that a medium such as VR offers an effective way of overcoming some of the shortcomings of traditional media by taking advantage of continuing technological advances in Information, Technology and Communications (ITC). In this paper we make particular reference to this as an extension to Geographical Information Systems (GIS). VR as a two-way communication tool offers considerable potential particularly in the area of Public Participation GIS (PPGIS). Information rich virtual environments that can operate over broadband networks are now possible and thus allow for the representation of large amounts of qualitative and quantitative information 'side-by-side'. Therefore, with broadband access becoming standard for households and enterprises alike, distributed virtual reality environments have great potential to contribute to enabling stakeholder participation and mutual learning within the planning context.

Development of a Catchment Water Quality Model for Continuous Simulations of Pollutants Build-up and Wash-off

Estimation of runoff water quality parameters is required to determine appropriate water quality management options. Various models are used to estimate runoff water quality parameters. However, most models provide event-based estimates of water quality parameters for specific sites. The work presented in this paper describes the development of a model that continuously simulates the accumulation and wash-off of water quality pollutants in a catchment. The model allows estimation of pollutants build-up during dry periods and pollutants wash-off during storm events. The model was developed by integrating two individual models; rainfall-runoff model, and catchment water quality model. The rainfall-runoff model is based on the time-area runoff estimation method. The model allows users to estimate the time of concentration using a range of established methods. The model also allows estimation of the continuing runoff losses using any of the available estimation methods (i.e., constant, linearly varying or exponentially varying). Pollutants build-up in a catchment was represented by one of three pre-defined functions; power, exponential, or saturation. Similarly, pollutants wash-off was represented by one of three different functions; power, rating-curve, or exponential. The developed runoff water quality model was set-up to simulate the build-up and wash-off of total suspended solids (TSS), total phosphorus (TP) and total nitrogen (TN). The application of the model was demonstrated using available runoff and TSS field data from road and roof surfaces in the Gold Coast, Australia. The model provided excellent representation of the field data demonstrating the simplicity yet effectiveness of the proposed model.

Infrared Face Recognition Using Distance Transforms

In this work we present an efficient approach for face recognition in the infrared spectrum. In the proposed approach physiological features are extracted from thermal images in order to build a unique thermal faceprint. Then, a distance transform is used to get an invariant representation for face recognition. The obtained physiological features are related to the distribution of blood vessels under the face skin. This blood network is unique to each individual and can be used in infrared face recognition. The obtained results are promising and show the effectiveness of the proposed scheme.