Compression of Semistructured Documents

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

A Materialized Approach to the Integration of XML Documents: the OSIX System

The data exchanged on the Web are of different nature from those treated by the classical database management systems; these data are called semi-structured data since they do not have a regular and static structure like data found in a relational database; their schema is dynamic and may contain missing data or types. Therefore, the needs for developing further techniques and algorithms to exploit and integrate such data, and extract relevant information for the user have been raised. In this paper we present the system OSIX (Osiris based System for Integration of XML Sources). This system has a Data Warehouse model designed for the integration of semi-structured data and more precisely for the integration of XML documents. The architecture of OSIX relies on the Osiris system, a DL-based model designed for the representation and management of databases and knowledge bases. Osiris is a viewbased data model whose indexing system supports semantic query optimization. We show that the problem of query processing on a XML source is optimized by the indexing approach proposed by Osiris.

Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm

With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.

Comparison of Three Meta Heuristics to Optimize Hybrid Flow Shop Scheduling Problem with Parallel Machines

This study compares three meta heuristics to minimize makespan (Cmax) for Hybrid Flow Shop (HFS) Scheduling Problem with Parallel Machines. This problem is known to be NP-Hard. This study proposes three algorithms among improvement heuristic searches which are: Genetic Algorithm (GA), Simulated Annealing (SA), and Tabu Search (TS). SA and TS are known as deterministic improvement heuristic search. GA is known as stochastic improvement heuristic search. A comprehensive comparison from these three improvement heuristic searches is presented. The results for the experiments conducted show that TS is effective and efficient to solve HFS scheduling problems.

Solving Part Type Selection and Loading Problem in Flexible Manufacturing System Using Real Coded Genetic Algorithms – Part II: Optimization

This paper presents modeling and optimization of two NP-hard problems in flexible manufacturing system (FMS), part type selection problem and loading problem. Due to the complexity and extent of the problems, the paper was split into two parts. The first part of the papers has discussed the modeling of the problems and showed how the real coded genetic algorithms (RCGA) can be applied to solve the problems. This second part discusses the effectiveness of the RCGA which uses an array of real numbers as chromosome representation. The novel proposed chromosome representation produces only feasible solutions which minimize a computational time needed by GA to push its population toward feasible search space or repair infeasible chromosomes. The proposed RCGA improves the FMS performance by considering two objectives, maximizing system throughput and maintaining the balance of the system (minimizing system unbalance). The resulted objective values are compared to the optimum values produced by branch-and-bound method. The experiments show that the proposed RCGA could reach near optimum solutions in a reasonable amount of time.

Comparison Analysis of the Wald-s and the Bayes Type Sequential Methods for Testing Hypotheses

The Comparison analysis of the Wald-s and Bayestype sequential methods for testing hypotheses is offered. The merits of the new sequential test are: universality which consists in optimality (with given criteria) and uniformity of decision-making regions for any number of hypotheses; simplicity, convenience and uniformity of the algorithms of their realization; reliability of the obtained results and an opportunity of providing the errors probabilities of desirable values. There are given the Computation results of concrete examples which confirm the above-stated characteristics of the new method and characterize the considered methods in regard to each other.

Adaptive Distributed Genetic Algorithms and Its VLSI Design

This paper presents a dynamic adaptation scheme for the frequency of inter-deme migration in distributed genetic algorithms (GA), and its VLSI hardware design. Distributed GA, or multi-deme-based GA, uses multiple populations which evolve concurrently. The purpose of dynamic adaptation is to improve convergence performance so as to obtain better solutions. Through simulation experiments, we proved that our scheme achieves better performance than fixed frequency migration schemes.

Tracking Objects in Color Image Sequences: Application to Football Images

In this paper, we present a comparative study between two computer vision systems for objects recognition and tracking, these algorithms describe two different approach based on regions constituted by a set of pixels which parameterized objects in shot sequences. For the image segmentation and objects detection, the FCM technique is used, the overlapping between cluster's distribution is minimized by the use of suitable color space (other that the RGB one). The first technique takes into account a priori probabilities governing the computation of various clusters to track objects. A Parzen kernel method is described and allows identifying the players in each frame, we also show the importance of standard deviation value research of the Gaussian probability density function. Region matching is carried out by an algorithm that operates on the Mahalanobis distance between region descriptors in two subsequent frames and uses singular value decomposition to compute a set of correspondences satisfying both the principle of proximity and the principle of exclusion.

Partial 3D Reconstruction using Evolutionary Algorithms

When reconstructing a scenario, it is necessary to know the structure of the elements present on the scene to have an interpretation. In this work we link 3D scenes reconstruction to evolutionary algorithms through the vision stereo theory. We consider vision stereo as a method that provides the reconstruction of a scene using only a couple of images of the scene and performing some computation. Through several images of a scene, captured from different positions, vision stereo can give us an idea about the threedimensional characteristics of the world. Vision stereo usually requires of two cameras, making an analogy to the mammalian vision system. In this work we employ only a camera, which is translated along a path, capturing images every certain distance. As we can not perform all computations required for an exhaustive reconstruction, we employ an evolutionary algorithm to partially reconstruct the scene in real time. The algorithm employed is the fly algorithm, which employ “flies" to reconstruct the principal characteristics of the world following certain evolutionary rules.

Quantum Computation using Two Component Bose-Einstein Condensates

Quantum computation using qubits made of two component Bose-Einstein condensates (BECs) is analyzed. We construct a general framework for quantum algorithms to be executed using the collective states of the BECs. The use of BECs allows for an increase of energy scales via bosonic enhancement, resulting in two qubit gate operations that can be performed at a time reduced by a factor of N, where N is the number of bosons per qubit. We illustrate the scheme by an application to Deutsch-s and Grover-s algorithms, and discuss possible experimental implementations. Decoherence effects are analyzed under both general conditions and for the experimental implementation proposed.

Accurate Visualization of Graphs of Functions of Two Real Variables

The study of a real function of two real variables can be supported by visualization using a Computer Algebra System (CAS). One type of constraints of the system is due to the algorithms implemented, yielding continuous approximations of the given function by interpolation. This often masks discontinuities of the function and can provide strange plots, not compatible with the mathematics. In recent years, point based geometry has gained increasing attention as an alternative surface representation, both for efficient rendering and for flexible geometry processing of complex surfaces. In this paper we present different artifacts created by mesh surfaces near discontinuities and propose a point based method that controls and reduces these artifacts. A least squares penalty method for an automatic generation of the mesh that controls the behavior of the chosen function is presented. The special feature of this method is the ability to improve the accuracy of the surface visualization near a set of interior points where the function may be discontinuous. The present method is formulated as a minimax problem and the non uniform mesh is generated using an iterative algorithm. Results show that for large poorly conditioned matrices, the new algorithm gives more accurate results than the classical preconditioned conjugate algorithm.

A Context-Aware Supplier Selection Model

Selection of the best possible set of suppliers has a significant impact on the overall profitability and success of any business. For this reason, it is usually necessary to optimize all business processes and to make use of cost-effective alternatives for additional savings. This paper proposes a new efficient context-aware supplier selection model that takes into account possible changes of the environment while significantly reducing selection costs. The proposed model is based on data clustering techniques while inspiring certain principles of online algorithms for an optimally selection of suppliers. Unlike common selection models which re-run the selection algorithm from the scratch-line for any decision-making sub-period on the whole environment, our model considers the changes only and superimposes it to the previously defined best set of suppliers to obtain a new best set of suppliers. Therefore, any recomputation of unchanged elements of the environment is avoided and selection costs are consequently reduced significantly. A numerical evaluation confirms applicability of this model and proves that it is a more optimal solution compared with common static selection models in this field.

Improving Performance of World Wide Web by Adaptive Web Traffic Reduction

The ever increasing use of World Wide Web in the existing network, results in poor performance. Several techniques have been developed for reducing web traffic by compressing the size of the file, saving the web pages at the client side, changing the burst nature of traffic into constant rate etc. No single method was adequate enough to access the document instantly through the Internet. In this paper, adaptive hybrid algorithms are developed for reducing web traffic. Intelligent agents are used for monitoring the web traffic. Depending upon the bandwidth usage, user-s preferences, server and browser capabilities, intelligent agents use the best techniques to achieve maximum traffic reduction. Web caching, compression, filtering, optimization of HTML tags, and traffic dispersion are incorporated into this adaptive selection. Using this new hybrid technique, latency is reduced to 20 – 60 % and cache hit ratio is increased 40 – 82 %.

Modeling and Simulation of Robotic Arm Movement using Soft Computing

In this research paper we have presented control architecture for robotic arm movement and trajectory planning using Fuzzy Logic (FL) and Genetic Algorithms (GAs). This architecture is used to compensate the uncertainties like; movement, friction and settling time in robotic arm movement. The genetic algorithms and fuzzy logic is used to meet the objective of optimal control movement of robotic arm. This proposed technique represents a general model for redundant structures and may extend to other structures. Results show optimal angular movement of joints as result of evolutionary process. This technique has edge over the other techniques as minimum mathematics complexity used.

IIR Filter design with Craziness based Particle Swarm Optimization Technique

This paper demonstrates the application of craziness based particle swarm optimization (CRPSO) technique for designing the 8th order low pass Infinite Impulse Response (IIR) filter. CRPSO, the much improved version of PSO, is a population based global heuristic search algorithm which finds near optimal solution in terms of a set of filter coefficients. Effectiveness of this algorithm is justified with a comparative study of some well established algorithms, namely, real coded genetic algorithm (RGA) and particle swarm optimization (PSO). Simulation results affirm that the proposed algorithm CRPSO, outperforms over its counterparts not only in terms of quality output i.e. sharpness at cut-off, pass band ripple, stop band ripple, and stop band attenuation but also in convergence speed with assured stability.

Modeling and Simulating Reaction-Diffusion Systems with State-Dependent Diffusion Coefficients

The present models and simulation algorithms of intracellular stochastic kinetics are usually based on the premise that diffusion is so fast that the concentrations of all the involved species are homogeneous in space. However, recents experimental measurements of intracellular diffusion constants indicate that the assumption of a homogeneous well-stirred cytosol is not necessarily valid even for small prokaryotic cells. In this work a mathematical treatment of diffusion that can be incorporated in a stochastic algorithm simulating the dynamics of a reaction-diffusion system is presented. The movement of a molecule A from a region i to a region j of the space is represented as a first order reaction Ai k- ! Aj , where the rate constant k depends on the diffusion coefficient. The diffusion coefficients are modeled as function of the local concentration of the solutes, their intrinsic viscosities, their frictional coefficients and the temperature of the system. The stochastic time evolution of the system is given by the occurrence of diffusion events and chemical reaction events. At each time step an event (reaction or diffusion) is selected from a probability distribution of waiting times determined by the intrinsic reaction kinetics and diffusion dynamics. To demonstrate the method the simulation results of the reaction-diffusion system of chaperoneassisted protein folding in cytoplasm are shown.

Meta Model Based EA for Complex Optimization

Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, many real life optimization problems often require finding optimal solution to complex high dimensional, multimodal problems involving computationally very expensive fitness function evaluations. Use of evolutionary algorithms in such problem domains is thus practically prohibitive. An attractive alternative is to build meta models or use an approximation of the actual fitness functions to be evaluated. These meta models are order of magnitude cheaper to evaluate compared to the actual function evaluation. Many regression and interpolation tools are available to build such meta models. This paper briefly discusses the architectures and use of such meta-modeling tools in an evolutionary optimization context. We further present two evolutionary algorithm frameworks which involve use of meta models for fitness function evaluation. The first framework, namely the Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model [14] reduces computation time by controlled use of meta-models (in this case approximate model generated by Support Vector Machine regression) to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the metamodel are generated from a single uniform model. This does not take into account uncertain scenarios involving noisy fitness functions. The second model, DAFHEA-II, an enhanced version of the original DAFHEA framework, incorporates a multiple-model based learning approach for the support vector machine approximator to handle noisy functions [15]. Empirical results obtained by evaluating the frameworks using several benchmark functions demonstrate their efficiency

Soft Real-Time Fuzzy Task Scheduling for Multiprocessor Systems

All practical real-time scheduling algorithms in multiprocessor systems present a trade-off between their computational complexity and performance. In real-time systems, tasks have to be performed correctly and timely. Finding minimal schedule in multiprocessor systems with real-time constraints is shown to be NP-hard. Although some optimal algorithms have been employed in uni-processor systems, they fail when they are applied in multiprocessor systems. The practical scheduling algorithms in real-time systems have not deterministic response time. Deterministic timing behavior is an important parameter for system robustness analysis. The intrinsic uncertainty in dynamic real-time systems increases the difficulties of scheduling problem. To alleviate these difficulties, we have proposed a fuzzy scheduling approach to arrange real-time periodic and non-periodic tasks in multiprocessor systems. Static and dynamic optimal scheduling algorithms fail with non-critical overload. In contrast, our approach balances task loads of the processors successfully while consider starvation prevention and fairness which cause higher priority tasks have higher running probability. A simulation is conducted to evaluate the performance of the proposed approach. Experimental results have shown that the proposed fuzzy scheduler creates feasible schedules for homogeneous and heterogeneous tasks. It also and considers tasks priorities which cause higher system utilization and lowers deadline miss time. According to the results, it performs very close to optimal schedule of uni-processor systems.

Fuzzy Relatives of the CLARANS Algorithm With Application to Text Clustering

This paper introduces new algorithms (Fuzzy relative of the CLARANS algorithm FCLARANS and Fuzzy c Medoids based on randomized search FCMRANS) for fuzzy clustering of relational data. Unlike existing fuzzy c-medoids algorithm (FCMdd) in which the within cluster dissimilarity of each cluster is minimized in each iteration by recomputing new medoids given current memberships, FCLARANS minimizes the same objective function minimized by FCMdd by changing current medoids in such away that that the sum of the within cluster dissimilarities is minimized. Computing new medoids may be effected by noise because outliers may join the computation of medoids while the choice of medoids in FCLARANS is dictated by the location of a predominant fraction of points inside a cluster and, therefore, it is less sensitive to the presence of outliers. In FCMRANS the step of computing new medoids in FCMdd is modified to be based on randomized search. Furthermore, a new initialization procedure is developed that add randomness to the initialization procedure used with FCMdd. Both FCLARANS and FCMRANS are compared with the robust and linearized version of fuzzy c-medoids (RFCMdd). Experimental results with different samples of the Reuter-21578, Newsgroups (20NG) and generated datasets with noise show that FCLARANS is more robust than both RFCMdd and FCMRANS. Finally, both FCMRANS and FCLARANS are more efficient and their outputs are almost the same as that of RFCMdd in terms of classification rate.

Solving Machine Loading Problem in Flexible Manufacturing Systems Using Particle Swarm Optimization

In this paper, a particle swarm optimization (PSO) algorithm is proposed to solve machine loading problem in flexible manufacturing system (FMS), with bicriterion objectives of minimizing system unbalance and maximizing system throughput in the occurrence of technological constraints such as available machining time and tool slots. A mathematical model is used to select machines, assign operations and the required tools. The performance of the PSO is tested by using 10 sample dataset and the results are compared with the heuristics reported in the literature. The results support that the proposed PSO is comparable with the algorithms reported in the literature.