Computing Entropy for Ortholog Detection

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Crystalline Graphene Nanoribbons with Atomically Smooth Edges via a Novel Physico- Chemical Route

A novel physico-chemical route to produce few layer graphene nanoribbons with atomically smooth edges is reported, via acid treatment (H2SO4:HNO3) followed by characteristic thermal shock processes involving extremely cold substances. Samples were studied by scanning electron microscopy (SEM), transmission electron microscopy (TEM), X-ray diffraction (XRD), Raman spectroscopy and X-ray photoelectron spectroscopy. This method demonstrates the importance of having the nanotubes open ended for an efficient uniform unzipping along the nanotube axis. The average dimensions of these nanoribbons are approximately ca. 210 nm wide and consist of few layers, as observed by transmission electron microscopy. The produced nanoribbons exhibit different chiralities, as observed by high resolution transmission electron microscopy. This method is able to provide graphene nanoribbons with atomically smooth edges which could be used in various applications including sensors, gas adsorption materials, composite fillers, among others.

Optical Fish Tracking in Fishways using Neural Networks

One of the main issues in Computer Vision is to extract the movement of one or several points or objects of interest in an image or video sequence to conduct any kind of study or control process. Different techniques to solve this problem have been applied in numerous areas such as surveillance systems, analysis of traffic, motion capture, image compression, navigation systems and others, where the specific characteristics of each scenario determine the approximation to the problem. This paper puts forward a Computer Vision based algorithm to analyze fish trajectories in high turbulence conditions in artificial structures called vertical slot fishways, designed to allow the upstream migration of fish through obstructions in rivers. The suggested algorithm calculates the position of the fish at every instant starting from images recorded with a camera and using neural networks to execute fish detection on images. Different laboratory tests have been carried out in a full scale fishway model and with living fishes, allowing the reconstruction of the fish trajectory and the measurement of velocities and accelerations of the fish. These data can provide useful information to design more effective vertical slot fishways.

A Computer Aided Detection (CAD) System for Microcalcifications in Mammograms - MammoScan mCaD

Clusters of microcalcifications in mammograms are an important sign of breast cancer. This paper presents a complete Computer Aided Detection (CAD) scheme for automatic detection of clustered microcalcifications in digital mammograms. The proposed system, MammoScan μCaD, consists of three main steps. Firstly all potential microcalcifications are detected using a a method for feature extraction, VarMet, and adaptive thresholding. This will also give a number of false detections. The goal of the second step, Classifier level 1, is to remove everything but microcalcifications. The last step, Classifier level 2, uses learned dictionaries and sparse representations as a texture classification technique to distinguish single, benign microcalcifications from clustered microcalcifications, in addition to remove some remaining false detections. The system is trained and tested on true digital data from Stavanger University Hospital, and the results are evaluated by radiologists. The overall results are promising, with a sensitivity > 90 % and a low false detection rate (approx 1 unwanted pr. image, or 0.3 false pr. image).

Numerical Analysis of the SIR-SI Differential Equations with Application to Dengue Disease Mapping in Kuala Lumpur, Malaysia

The main aim of this study is to describe and introduce a method of numerical analysis in obtaining approximate solutions for the SIR-SI differential equations (susceptible-infectiverecovered for human populations; susceptible-infective for vector populations) that represent a model for dengue disease transmission. Firstly, we describe the ordinary differential equations for the SIR-SI disease transmission models. Then, we introduce the numerical analysis of solutions of this continuous time, discrete space SIR-SI model by simplifying the continuous time scale to a densely populated, discrete time scale. This is followed by the application of this numerical analysis of solutions of the SIR-SI differential equations to the estimation of relative risk using continuous time, discrete space dengue data of Kuala Lumpur, Malaysia. Finally, we present the results of the analysis, comparing and displaying the results in graphs, table and maps. Results of the numerical analysis of solutions that we implemented offers a useful and potentially superior model for estimating relative risks based on continuous time, discrete space data for vector borne infectious diseases specifically for dengue disease. 

An Effective Algorithm for Minimum Weighted Vertex Cover Problem

The Minimum Weighted Vertex Cover (MWVC) problem is a classic graph optimization NP - complete problem. Given an undirected graph G = (V, E) and weighting function defined on the vertex set, the minimum weighted vertex cover problem is to find a vertex set S V whose total weight is minimum subject to every edge of G has at least one end point in S. In this paper an effective algorithm, called Support Ratio Algorithm (SRA), is designed to find the minimum weighted vertex cover of a graph. Computational experiments are designed and conducted to study the performance of our proposed algorithm. Extensive simulation results show that the SRA can yield better solutions than other existing algorithms found in the literature for solving the minimum vertex cover problem.

Optimal Data Compression and Filtering: The Case of Infinite Signal Sets

We present a theory for optimal filtering of infinite sets of random signals. There are several new distinctive features of the proposed approach. First, we provide a single optimal filter for processing any signal from a given infinite signal set. Second, the filter is presented in the special form of a sum with p terms where each term is represented as a combination of three operations. Each operation is a special stage of the filtering aimed at facilitating the associated numerical work. Third, an iterative scheme is implemented into the filter structure to provide an improvement in the filter performance at each step of the scheme. The final step of the concerns signal compression and decompression. This step is based on the solution of a new rank-constrained matrix approximation problem. The solution to the matrix problem is described in this paper. A rigorous error analysis is given for the new filter.

Boundary-Element-Based Finite Element Methods for Helmholtz and Maxwell Equations on General Polyhedral Meshes

We present new finite element methods for Helmholtz and Maxwell equations on general three-dimensional polyhedral meshes, based on domain decomposition with boundary elements on the surfaces of the polyhedral volume elements. The methods use the lowest-order polynomial spaces and produce sparse, symmetric linear systems despite the use of boundary elements. Moreover, piecewise constant coefficients are admissible. The resulting approximation on the element surfaces can be extended throughout the domain via representation formulas. Numerical experiments confirm that the convergence behavior on tetrahedral meshes is comparable to that of standard finite element methods, and equally good performance is attained on more general meshes.

Constraint Based Frequent Pattern Mining Technique for Solving GCS Problem

Generalized Center String (GCS) problem are generalized from Common Approximate Substring problem and Common substring problems. GCS are known to be NP-hard allowing the problems lies in the explosion of potential candidates. Finding longest center string without concerning the sequence that may not contain any motifs is not known in advance in any particular biological gene process. GCS solved by frequent pattern-mining techniques and known to be fixed parameter tractable based on the fixed input sequence length and symbol set size. Efficient method known as Bpriori algorithms can solve GCS with reasonable time/space complexities. Bpriori 2 and Bpriori 3-2 algorithm are been proposed of any length and any positions of all their instances in input sequences. In this paper, we reduced the time/space complexity of Bpriori algorithm by Constrained Based Frequent Pattern mining (CBFP) technique which integrates the idea of Constraint Based Mining and FP-tree mining. CBFP mining technique solves the GCS problem works for all center string of any length, but also for the positions of all their mutated copies of input sequence. CBFP mining technique construct TRIE like with FP tree to represent the mutated copies of center string of any length, along with constraints to restraint growth of the consensus tree. The complexity analysis for Constrained Based FP mining technique and Bpriori algorithm is done based on the worst case and average case approach. Algorithm's correctness compared with the Bpriori algorithm using artificial data is shown.

Modeling of Crude Oil Blending via Discrete-Time Neural Networks

Crude oil blending is an important unit operation in petroleum refining industry. A good model for the blending system is beneficial for supervision operation, prediction of the export petroleum quality and realizing model-based optimal control. Since the blending cannot follow the ideal mixing rule in practice, we propose a static neural network to approximate the blending properties. By the dead-zone approach, we propose a new robust learning algorithm and give theoretical analysis. Real data of crude oil blending is applied to illustrate the neuro modeling approach.

A Comparison Study of the Removal of Selected Pharmaceuticals in Waters by Chemical Oxidation Treatments

The degradation of selected pharmaceuticals in some water matrices was studied by using several chemical treatments. The pharmaceuticals selected were the beta-blocker metoprolol, the nonsteroidal anti-inflammatory naproxen, the antibiotic amoxicillin, and the analgesic phenacetin; and their degradations were conducted by using UV radiation alone, ozone, Fenton-s reagent, Fenton-like system, photo-Fenton system, and combinations of UV radiation and ozone with H2O2, TiO2, Fe(II), and Fe(III). The water matrices, in addition to ultra-pure water, were a reservoir water, a groundwater, and two secondary effluents from two municipal WWTP. The results reveal that the presence of any second oxidant enhanced the oxidation rates, with the systems UV/TiO2 and O3/TiO2 providing the highest degradation rates. It is also observed in most of the investigated oxidation systems that the degradation rate followed the sequence: amoxicillin > naproxen > metoprolol > phenacetin. Lower rates were obtained with the pharmaceuticals dissolved in natural waters and secondary effluents due to the organic matter present which consume some amounts of the oxidant agents.

Alternating Implicit Block FDTD Method For Scalar Wave Equation

In this paper, an alternating implicit block method for solving two dimensional scalar wave equation is presented. The new method consist of two stages for each time step implemented in alternating directions which are very simple in computation. To increase the speed of computation, a group of adjacent points is computed simultaneously. It is shown that the presented method increase the maximum time step size and more accurate than the conventional finite difference time domain (FDTD) method and other existing method of natural ordering.

Join and Meet Block Based Default Definite Decision Rule Mining from IDT and an Incremental Algorithm

Using maximal consistent blocks of tolerance relation on the universe in incomplete decision table, the concepts of join block and meet block are introduced and studied. Including tolerance class, other blocks such as tolerant kernel and compatible kernel of an object are also discussed at the same time. Upper and lower approximations based on those blocks are also defined. Default definite decision rules acquired from incomplete decision table are proposed in the paper. An incremental algorithm to update default definite decision rules is suggested for effective mining tasks from incomplete decision table into which data is appended. Through an example, we demonstrate how default definite decision rules based on maximal consistent blocks, join blocks and meet blocks are acquired and how optimization is done in support of discernibility matrix and discernibility function in the incomplete decision table.

New Graph Similarity Measurements based on Isomorphic and Nonisomorphic Data Fusion and their Use in the Prediction of the Pharmacological Behavior of Drugs

New graph similarity methods have been proposed in this work with the aim to refining the chemical information extracted from molecules matching. For this purpose, data fusion of the isomorphic and nonisomorphic subgraphs into a new similarity measure, the Approximate Similarity, was carried out by several approaches. The application of the proposed method to the development of quantitative structure-activity relationships (QSAR) has provided reliable tools for predicting several pharmacological parameters: binding of steroids to the globulin-corticosteroid receptor, the activity of benzodiazepine receptor compounds, and the blood brain barrier permeability. Acceptable results were obtained for the models presented here.

Gauteng-s Waste Outlook: A Reflection

Gauteng, as the province with the greatest industrial and population density, the economic hub of South Africa also generates the greatest amount of waste, both general and hazardous. Therefore the province has a significant need to develop and apply appropriate integrated waste management policies that ensure that waste is recognised as a serious problem and is managed in an effective integrated manner to preserve both the present and future human health and environment. This paper reflects on Gauteng-s waste outlook in particular the province-s General Waste Minimisation Plan and its Integrated Waste Management Policy. The paper also looks at general waste generation, recyclable waste streams as well as recycling and separation at source initiatives in the province. Both the quantity and nature of solid waste differs considerably across the socio-economic spectrum. People in informal settlements generate an average of 0.16 kg per person per day whereas 2 kg per day is not unusual in affluent areas. For example the amount of waste generated in Johannesburg is approximately 1.2 kg per person per day.

Heating of High-Density Hydrogen by High- Current Arc Radiation

The investigation results of high-density hydrogen heating by high-current electric arc are presented at initial pressure from 5 MPa to 160 MPa with current amplitude up to 1.6 MA and current rate of rise 109-1011 A/s. When changing the initial pressure and current rate of rise, channel temperature varies from several electronvolts to hundreds electronvolts. Arc channel radius is several millimeters. But the radius of the discharge chamber greater than the radius of the arc channel on approximately order of magnitude. High efficiency of gas heating is caused by radiation absorption of hydrogen surrounding the arc. Current channel consist from vapor of the initiating wire. At current rate of rise of 109 A/s and relatively small current amplitude gas heating occurs due to radiation absorption in the band transparency of hydrogen by the wire vapours with photon energies less than 13.6 eV. At current rate of rise of 1011 A/s gas heating is due to hydrogen absorption of soft X-rays from discharge channel.

An Implicit Region-Based Deformable Model with Local Segmentation Applied to Weld Defects Extraction

This paper is devoted to present and discuss a model that allows a local segmentation by using statistical information of a given image. It is based on Chan-Vese model, curve evolution, partial differential equations and binary level sets method. The proposed model uses the piecewise constant approximation of Chan-Vese model to compute Signed Pressure Force (SPF) function, this one attracts the curve to the true object(s)-s boundaries. The implemented model is used to extract weld defects from weld radiographic images in the aim to calculate the perimeter and surfaces of those weld defects; encouraged resultants are obtained on synthetic and real radiographic images.

Frequency-Energy Characteristics of Local Earthquakes using Discrete Wavelet Transform(DWT)

The wavelet transform is one of the most important method used in signal processing. In this study, we have introduced frequency-energy characteristics of local earthquakes using discrete wavelet transform. Frequency-energy characteristic was analyzed depend on difference between P and S wave arrival time and noise within records. We have found that local earthquakes have similar characteristics. If frequency-energy characteristics can be found accurately, this gives us a hint to calculate P and S wave arrival time. It can be seen that wavelet transform provides successful approximation for this. In this study, 100 earthquakes with 500 records were analyzed approximately.

Sensorless Control of a Six-Phase Induction Motors Drive Using FOC in Stator Flux Reference Frame

In this paper, a direct torque control - space vector modulation (DTC-SVM) scheme is presented for a six-phase speed and voltage sensorless induction motor (IM) drive. The decoupled torque and stator flux control is achieved based on IM stator flux field orientation. The rotor speed is detected by on-line estimating of the rotor angular slip speed and stator vector flux speed. In addition, a simple method is introduced to estimate the stator resistance. Moreover in this control scheme the voltage sensors are eliminated and actual motor phase voltages are approximated by using PWM inverter switching times and the dc link voltage. Finally, some simulation and experimental results are presented to verify the effectiveness and capability of the proposed control scheme.

A Hybrid Neural Network and Traditional Approach for Forecasting Lumpy Demand

Accurate demand forecasting is one of the most key issues in inventory management of spare parts. The problem of modeling future consumption becomes especially difficult for lumpy patterns, which characterized by intervals in which there is no demand and, periods with actual demand occurrences with large variation in demand levels. However, many of the forecasting methods may perform poorly when demand for an item is lumpy. In this study based on the characteristic of lumpy demand patterns of spare parts a hybrid forecasting approach has been developed, which use a multi-layered perceptron neural network and a traditional recursive method for forecasting future demands. In the described approach the multi-layered perceptron are adapted to forecast occurrences of non-zero demands, and then a conventional recursive method is used to estimate the quantity of non-zero demands. In order to evaluate the performance of the proposed approach, their forecasts were compared to those obtained by using Syntetos & Boylan approximation, recently employed multi-layered perceptron neural network, generalized regression neural network and elman recurrent neural network in this area. The models were applied to forecast future demand of spare parts of Arak Petrochemical Company in Iran, using 30 types of real data sets. The results indicate that the forecasts obtained by using our proposed mode are superior to those obtained by using other methods.