3D Oil Reservoir Visualisation Using Octree Compression Techniques Utilising Logical Grid Co-Ordinates

Octree compression techniques have been used for several years for compressing large three dimensional data sets into homogeneous regions. This compression technique is ideally suited to datasets which have similar values in clusters. Oil engineers represent reservoirs as a three dimensional grid where hydrocarbons occur naturally in clusters. This research looks at the efficiency of storing these grids using octree compression techniques where grid cells are broken into active and inactive regions. Initial experiments yielded high compression ratios as only active leaf nodes and their ancestor, header nodes are stored as a bitstream to file on disk. Savings in computational time and memory were possible at decompression, as only active leaf nodes are sent to the graphics card eliminating the need of reconstructing the original matrix. This results in a more compact vertex table, which can be loaded into the graphics card quicker and generating shorter refresh delay times.

Software Maintenance Severity Prediction with Soft Computing Approach

As the majority of faults are found in a few of its modules so there is a need to investigate the modules that are affected severely as compared to other modules and proper maintenance need to be done on time especially for the critical applications. In this paper, we have explored the different predictor models to NASA-s public domain defect dataset coded in Perl programming language. Different machine learning algorithms belonging to the different learner categories of the WEKA project including Mamdani Based Fuzzy Inference System and Neuro-fuzzy based system have been evaluated for the modeling of maintenance severity or impact of fault severity. The results are recorded in terms of Accuracy, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). The results show that Neuro-fuzzy based model provides relatively better prediction accuracy as compared to other models and hence, can be used for the maintenance severity prediction of the software.

Evolving a Fuzzy Rule-Base for Image Segmentation

A new method for color image segmentation using fuzzy logic is proposed in this paper. Our aim here is to automatically produce a fuzzy system for color classification and image segmentation with least number of rules and minimum error rate. Particle swarm optimization is a sub class of evolutionary algorithms that has been inspired from social behavior of fishes, bees, birds, etc, that live together in colonies. We use comprehensive learning particle swarm optimization (CLPSO) technique to find optimal fuzzy rules and membership functions because it discourages premature convergence. Here each particle of the swarm codes a set of fuzzy rules. During evolution, a population member tries to maximize a fitness criterion which is here high classification rate and small number of rules. Finally, particle with the highest fitness value is selected as the best set of fuzzy rules for image segmentation. Our results, using this method for soccer field image segmentation in Robocop contests shows 89% performance. Less computational load is needed when using this method compared with other methods like ANFIS, because it generates a smaller number of fuzzy rules. Large train dataset and its variety, makes the proposed method invariant to illumination noise

Spatial Correlation Analysis between Climate Factors and Plant Production in Asia

Using 1km grid datasets representing monthly mean precipitation, monthly mean temperature, and dry matter production (DMP), we considered the regional plant production ability in Southeast and South Asia, and also employed pixel-by-pixel correlation analysis to assess the intensity of relation between climate factors and plant production. While annual DMP in South Asia was approximately less than 2,000kg, the one in most part of Southeast Asia exceeded 2,500 - 3,000kg. It suggested that plant production in Southeast Asia was superior to South Asia, however, Rain-Use Efficiency (RUE) representing dry matter production per 1mm precipitation showed that inland of Indochina Peninsula and India were higher than islands in Southeast Asia. By the results of correlation analysis between climate factors and DMP, while the area in most parts of Indochina Peninsula indicated negative correlation coefficients between DMP and precipitation or temperature, the area in Malay Peninsula and islands showed negative correlation to precipitation and positive one to temperature, and most part of India dominating South Asia showed positive to precipitation and negative to temperature. In addition, the areas where the correlation coefficients exceeded |0.8| were regarded as “susceptible" to climate factors, and the areas smaller than |0.2| were “insusceptible". By following the discrimination, the map implying expected impacts by climate change was provided.

Studding of Number of Dataset on Precision of Estimated Saturated Hydraulic Conductivity

Saturated hydraulic conductivity of Soil is an important property in processes involving water and solute flow in soils. Saturated hydraulic conductivity of soil is difficult to measure and can be highly variable, requiring a large number of replicate samples. In this study, 60 sets of soil samples were collected at Saqhez region of Kurdistan province-IRAN. The statistics such as Correlation Coefficient (R), Root Mean Square Error (RMSE), Mean Bias Error (MBE) and Mean Absolute Error (MAE) were used to evaluation the multiple linear regression models varied with number of dataset. In this study the multiple linear regression models were evaluated when only percentage of sand, silt, and clay content (SSC) were used as inputs, and when SSC and bulk density, Bd, (SSC+Bd) were used as inputs. The R, RMSE, MBE and MAE values of the 50 dataset for method (SSC), were calculated 0.925, 15.29, -1.03 and 12.51 and for method (SSC+Bd), were calculated 0.927, 15.28,-1.11 and 12.92, respectively, for relationship obtained from multiple linear regressions on data. Also the R, RMSE, MBE and MAE values of the 10 dataset for method (SSC), were calculated 0.725, 19.62, - 9.87 and 18.91 and for method (SSC+Bd), were calculated 0.618, 24.69, -17.37 and 22.16, respectively, which shows when number of dataset increase, precision of estimated saturated hydraulic conductivity, increases.

Evaluation of Haar Cascade Classifiers Designed for Face Detection

In the past years a lot of effort has been made in the field of face detection. The human face contains important features that can be used by vision-based automated systems in order to identify and recognize individuals. Face location, the primary step of the vision-based automated systems, finds the face area in the input image. An accurate location of the face is still a challenging task. Viola-Jones framework has been widely used by researchers in order to detect the location of faces and objects in a given image. Face detection classifiers are shared by public communities, such as OpenCV. An evaluation of these classifiers will help researchers to choose the best classifier for their particular need. This work focuses of the evaluation of face detection classifiers minding facial landmarks.

Real Time Multi-Sensory Force Sensing Mat for Sports Biomechanics and Human Gait Analysis

This paper presents a real time force sensing instrument that is designed for human gait analysis purposes. It is capable of recording and monitoring ground reaction forces exerted by human foot during various activities such as walking, running and jumping in real time. In overall, force sensing mat mainly consists of three elements: the force sensing mat, signal conditioning circuit and data acquisition device. Force sensing mat is the mat that contains an array of force sensing elements. To control and process the incoming signal from the force sensing mat, Force-Logger and Force-Reloader are developed using National Instrument Labview. This paper describes the architecture of the force sensing mat, signal conditioning circuit and the real time streaming of the incoming data from the force sensing mat. Additionally, a preliminary experiment dataset is presented in this paper.

Feature Selection for Breast Cancer Diagnosis: A Case-Based Wrapper Approach

This article addresses feature selection for breast cancer diagnosis. The present process contains a wrapper approach based on Genetic Algorithm (GA) and case-based reasoning (CBR). GA is used for searching the problem space to find all of the possible subsets of features and CBR is employed to estimate the evaluation result of each subset. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer (WDBC) dataset.

SAF: A Substitution and Alignment Free Similarity Measure for Protein Sequences

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it's possible to supply a suitable combination of input parameters that allows such an approach to yield biologically plausible results. However, for difficult-to-align protein sequences, supplying different combinations of input parameters yields different results. Such variable results create ambiguities and complicate the similarity measurement task. To overcome these drawbacks, this paper describes a novel and effective approach for measuring the similarity between protein sequences, called SAF for Substitution and Alignment Free. Without resorting either to the alignment of protein sequences or to substitution relations between amino acids, SAF is able to efficiently detect the significant subsequences that best represent the intrinsic properties of protein sequences, those underlying the chronological dependencies of structural features and biochemical activities of protein sequences. Moreover, by using a new efficient subsequence matching scheme, SAF more efficiently handles protein sequences that contain similar structural features with significant meaning in chronologically non-equivalent positions. To show the effectiveness of SAF, extensive experiments were performed on protein datasets from different databases, and the results were compared with those obtained by several mainstream algorithms.

Multiple-Level Sequential Pattern Discovery from Customer Transaction Databases

Mining sequential patterns from large customer transaction databases has been recognized as a key research topic in database systems. However, the previous works more focused on mining sequential patterns at a single concept level. In this study, we introduced concept hierarchies into this problem and present several algorithms for discovering multiple-level sequential patterns based on the hierarchies. An experiment was conducted to assess the performance of the proposed algorithms. The performances of the algorithms were measured by the relative time spent on completing the mining tasks on two different datasets. The experimental results showed that the performance depends on the characteristics of the datasets and the pre-defined threshold of minimal support for each level of the concept hierarchy. Based on the experimental results, some suggestions were also given for how to select appropriate algorithm for a certain datasets.

Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

A 3D Approach for Extraction of the Coronaryartery and Quantification of the Stenosis

Segmentation and quantification of stenosis is an important task in assessing coronary artery disease. One of the main challenges is measuring the real diameter of curved vessels. Moreover, uncertainty in segmentation of different tissues in the narrow vessel is an important issue that affects accuracy. This paper proposes an algorithm to extract coronary arteries and measure the degree of stenosis. Markovian fuzzy clustering method is applied to model uncertainty arises from partial volume effect problem. The algorithm employs: segmentation, centreline extraction, estimation of orthogonal plane to centreline, measurement of the degree of stenosis. To evaluate the accuracy and reproducibility, the approach has been applied to a vascular phantom and the results are compared with real diameter. The results of 10 patient datasets have been visually judged by a qualified radiologist. The results reveal the superiority of the proposed method compared to the Conventional thresholding Method (CTM) on both datasets.

Applications of Genetic Programming in Data Mining

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Automatic Visualization Pipeline Formation for Medical Datasets on Grid Computing Environment

Distance visualization of large datasets often takes the direction of remote viewing and zooming techniques of stored static images. However, the continuous increase in the size of datasets and visualization operation causes insufficient performance with traditional desktop computers. Additionally, the visualization techniques such as Isosurface depend on the available resources of the running machine and the size of datasets. Moreover, the continuous demand for powerful computing powers and continuous increase in the size of datasets results an urgent need for a grid computing infrastructure. However, some issues arise in current grid such as resources availability at the client machines which are not sufficient enough to process large datasets. On top of that, different output devices and different network bandwidth between the visualization pipeline components often result output suitable for one machine and not suitable for another. In this paper we investigate how the grid services could be used to support remote visualization of large datasets and to break the constraint of physical co-location of the resources by applying the grid computing technologies. We show our grid enabled architecture to visualize large medical datasets (circa 5 million polygons) for remote interactive visualization on modest resources clients.

Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.

An Empirical Analysis of the Board Composition Concerning Logistics Competencies

Empirical insights into the implementation of logistics competencies at the top management level are scarce. This paper addresses this issue with an explorative approach which is based on a dataset of 872 observations in the years 2000, 2004 and 2008 using quantitative content analysis from annual reports of the 500 publicly listed firms with the highest global research and development expenditures according to the British Department for Business Innovation and Skills. We find that logistics competencies are more pronounced in Asian companies than in their European or American counterparts. On an industrial level the results are quite mixed. Using partial point-biserial correlations we show that logistics competencies are positively related to financial performance.

Performance Evaluation of Wavelet Based Coders on Brain MRI Volumetric Medical Datasets for Storage and Wireless Transmission

In this paper, we evaluate the performance of some wavelet based coding algorithms such as 3D QT-L, 3D SPIHT and JPEG2K. In the first step we achieve an objective comparison between three coders, namely 3D SPIHT, 3D QT-L and JPEG2K. For this purpose, eight MRI head scan test sets of 256 x 256x124 voxels have been used. Results show superior performance of 3D SPIHT algorithm, whereas 3D QT-L outperforms JPEG2K. The second step consists of evaluating the robustness of 3D SPIHT and JPEG2K coding algorithm over wireless transmission. Compressed dataset images are then transmitted over AWGN wireless channel or over Rayleigh wireless channel. Results show the superiority of JPEG2K over these two models. In fact, it has been deduced that JPEG2K is more robust regarding coding errors. Thus we may conclude the necessity of using corrector codes in order to protect the transmitted medical information.

Recognition Machine (RM) for On-line and Isolated Flight Deck Officer (FDO) Gestures

The paper presents an on-line recognition machine (RM) for continuous/isolated, dynamic and static gestures that arise in Flight Deck Officer (FDO) training. RM is based on generic pattern recognition framework. Gestures are represented as templates using summary statistics. The proposed recognition algorithm exploits temporal and spatial characteristics of gestures via dynamic programming and Markovian process. The algorithm predicts corresponding index of incremental input data in the templates in an on-line mode. Accumulated consistency in the sequence of prediction provides a similarity measurement (Score) between input data and the templates. The algorithm provides an intuitive mechanism for automatic detection of start/end frames of continuous gestures. In the present paper, we consider isolated gestures. The performance of RM is evaluated using four datasets - artificial (W TTest), hand motion (Yang) and FDO (tracker, vision-based ). RM achieves comparable results which are in agreement with other on-line and off-line algorithms such as hidden Markov model (HMM) and dynamic time warping (DTW). The proposed algorithm has the additional advantage of providing timely feedback for training purposes.

Segmentation of Lungs from CT Scan Images for Early Diagnosis of Lung Cancer

Segmentation is an important step in medical image analysis and classification for radiological evaluation or computer aided diagnosis. The CAD (Computer Aided Diagnosis ) of lung CT generally first segment the area of interest (lung) and then analyze the separately obtained area for nodule detection in order to diagnosis the disease. For normal lung, segmentation can be performed by making use of excellent contrast between air and surrounding tissues. However this approach fails when lung is affected by high density pathology. Dense pathologies are present in approximately a fifth of clinical scans, and for computer analysis such as detection and quantification of abnormal areas it is vital that the entire and perfectly lung part of the image is provided and no part, as present in the original image be eradicated. In this paper we have proposed a lung segmentation technique which accurately segment the lung parenchyma from lung CT Scan images. The algorithm was tested against the 25 datasets of different patients received from Ackron Univeristy, USA and AGA Khan Medical University, Karachi, Pakistan.

Detecting Remote Protein Evolutionary Relationships via String Scoring Method

The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.