Full-genomic Network Inference for Non-model organisms: A Case Study for the Fungal Pathogen Candida albicans

Reverse engineering of full-genomic interaction networks based on compendia of expression data has been successfully applied for a number of model organisms. This study adapts these approaches for an important non-model organism: The major human fungal pathogen Candida albicans. During the infection process, the pathogen can adapt to a wide range of environmental niches and reversibly changes its growth form. Given the importance of these processes, it is important to know how they are regulated. This study presents a reverse engineering strategy able to infer fullgenomic interaction networks for C. albicans based on a linear regression, utilizing the sparseness criterion (LASSO). To overcome the limited amount of expression data and small number of known interactions, we utilize different prior-knowledge sources guiding the network inference to a knowledge driven solution. Since, no database of known interactions for C. albicans exists, we use a textmining system which utilizes full-text research papers to identify known regulatory interactions. By comparing with these known regulatory interactions, we find an optimal value for global modelling parameters weighting the influence of the sparseness criterion and the prior-knowledge. Furthermore, we show that soft integration of prior-knowledge additionally improves the performance. Finally, we compare the performance of our approach to state of the art network inference approaches.

A Hybrid Recommender System based on Collaborative Filtering and Cloud Model

User-based Collaborative filtering (CF), one of the most prevailing and efficient recommendation techniques, provides personalized recommendations to users based on the opinions of other users. Although the CF technique has been successfully applied in various applications, it suffers from serious sparsity problems. The cloud-model approach addresses the sparsity problems by constructing the user-s global preference represented by a cloud eigenvector. The user-based CF approach works well with dense datasets while the cloud-model CF approach has a greater performance when the dataset is sparse. In this paper, we present a hybrid approach that integrates the predictions from both the user-based CF and the cloud-model CF approaches. The experimental results show that the proposed hybrid approach can ameliorate the sparsity problem and provide an improved prediction quality.

An Algorithm for Computing the Analytic Singular Value Decomposition

A proof of convergence of a new continuation algorithm for computing the Analytic SVD for a large sparse parameter– dependent matrix is given. The algorithm itself was developed and numerically tested in [5].

Approaches to Determining Optimal Asset Structure for a Commercial Bank

Every commercial bank optimises its asset portfolio depending on the profitability of assets and chosen or imposed constraints. This paper proposes and applies a stylized model for optimising banks' asset and liability structure, reflecting profitability of different asset categories and their risks as well as costs associated with different liability categories and reserve requirements. The level of detail for asset and liability categories is chosen to create a suitably parsimonious model and to include the most important categories in the model. It is shown that the most appropriate optimisation criterion for the model is the maximisation of the ratio of net interest income to assets. The maximisation of this ratio is subject to several constraints. Some are accounting identities or dictated by legislative requirements; others vary depending on the market objectives for a particular bank. The model predicts variable amount of assets allocated to loan provision.

A Robust LS-SVM Regression

In comparison to the original SVM, which involves a quadratic programming task; LS–SVM simplifies the required computation, but unfortunately the sparseness of standard SVM is lost. Another problem is that LS-SVM is only optimal if the training samples are corrupted by Gaussian noise. In Least Squares SVM (LS–SVM), the nonlinear solution is obtained, by first mapping the input vector to a high dimensional kernel space in a nonlinear fashion, where the solution is calculated from a linear equation set. In this paper a geometric view of the kernel space is introduced, which enables us to develop a new formulation to achieve a sparse and robust estimate.

Gene Expression Data Classification Using Discriminatively Regularized Sparse Subspace Learning

Sparse representation which can represent high dimensional data effectively has been successfully used in computer vision and pattern recognition problems. However, it doesn-t consider the label information of data samples. To overcome this limitation, we develop a novel dimensionality reduction algorithm namely dscriminatively regularized sparse subspace learning(DR-SSL) in this paper. The proposed DR-SSL algorithm can not only make use of the sparse representation to model the data, but also can effective employ the label information to guide the procedure of dimensionality reduction. In addition,the presented algorithm can effectively deal with the out-of-sample problem.The experiments on gene-expression data sets show that the proposed algorithm is an effective tool for dimensionality reduction and gene-expression data classification.

Implementing an Intuitive Reasoner with a Large Weather Database

In this paper, the implementation of a rule-based intuitive reasoner is presented. The implementation included two parts: the rule induction module and the intuitive reasoner. A large weather database was acquired as the data source. Twelve weather variables from those data were chosen as the “target variables" whose values were predicted by the intuitive reasoner. A “complex" situation was simulated by making only subsets of the data available to the rule induction module. As a result, the rules induced were based on incomplete information with variable levels of certainty. The certainty level was modeled by a metric called "Strength of Belief", which was assigned to each rule or datum as ancillary information about the confidence in its accuracy. Two techniques were employed to induce rules from the data subsets: decision tree and multi-polynomial regression, respectively for the discrete and the continuous type of target variables. The intuitive reasoner was tested for its ability to use the induced rules to predict the classes of the discrete target variables and the values of the continuous target variables. The intuitive reasoner implemented two types of reasoning: fast and broad where, by analogy to human thought, the former corresponds to fast decision making and the latter to deeper contemplation. . For reference, a weather data analysis approach which had been applied on similar tasks was adopted to analyze the complete database and create predictive models for the same 12 target variables. The values predicted by the intuitive reasoner and the reference approach were compared with actual data. The intuitive reasoner reached near-100% accuracy for two continuous target variables. For the discrete target variables, the intuitive reasoner predicted at least 70% as accurately as the reference reasoner. Since the intuitive reasoner operated on rules derived from only about 10% of the total data, it demonstrated the potential advantages in dealing with sparse data sets as compared with conventional methods.

Feature Based Dense Stereo Matching using Dynamic Programming and Color

This paper presents a new feature based dense stereo matching algorithm to obtain the dense disparity map via dynamic programming. After extraction of some proper features, we use some matching constraints such as epipolar line, disparity limit, ordering and limit of directional derivative of disparity as well. Also, a coarseto- fine multiresolution strategy is used to decrease the search space and therefore increase the accuracy and processing speed. The proposed method links the detected feature points into the chains and compares some of the feature points from different chains, to increase the matching speed. We also employ color stereo matching to increase the accuracy of the algorithm. Then after feature matching, we use the dynamic programming to obtain the dense disparity map. It differs from the classical DP methods in the stereo vision, since it employs sparse disparity map obtained from the feature based matching stage. The DP is also performed further on a scan line, between any matched two feature points on that scan line. Thus our algorithm is truly an optimization method. Our algorithm offers a good trade off in terms of accuracy and computational efficiency. Regarding the results of our experiments, the proposed algorithm increases the accuracy from 20 to 70%, and reduces the running time of the algorithm almost 70%.

Auto Regressive Tree Modeling for Parametric Optimization in Fuzzy Logic Control System

The advantage of solving the complex nonlinear problems by utilizing fuzzy logic methodologies is that the experience or expert-s knowledge described as a fuzzy rule base can be directly embedded into the systems for dealing with the problems. The current limitation of appropriate and automated designing of fuzzy controllers are focused in this paper. The structure discovery and parameter adjustment of the Branched T-S fuzzy model is addressed by a hybrid technique of type constrained sparse tree algorithms. The simulation result for different system model is evaluated and the identification error is observed to be minimum.

Using Ferry Access Points to Improve the Performance of Message Ferrying in Delay-Tolerant Networks

Delay-Tolerant Networks (DTNs) are sparse, wireless networks where disconnections are common due to host mobility and low node density. The Message Ferrying (MF) scheme is a mobilityassisted paradigm to improve connectivity in DTN-like networks. A ferry or message ferry is a special node in the network which has a per-determined route in the deployed area and relays messages between mobile hosts (MHs) which are intermittently connected. Increased contact opportunities among mobile hosts and the ferry improve the performance of the network, both in terms of message delivery ratio and average end-end delay. However, due to the inherent mobility of mobile hosts and pre-determined periodicity of the message ferry, mobile hosts may often -miss- contact opportunities with a ferry. In this paper, we propose the combination of stationary ferry access points (FAPs) with MF routing to increase contact opportunities between mobile hosts and the MF and consequently improve the performance of the DTN. We also propose several placement models for deploying FAPs on MF routes. We evaluate the performance of the FAP placement models through comprehensive simulation. Our findings show that FAPs do improve the performance of MF-assisted DTNs and symmetric placement of FAPs outperforms other placement strategies.

An Incomplete Factorization Preconditioner for LMS Adaptive Filter

In this paper an efficient incomplete factorization preconditioner is proposed for the Least Mean Squares (LMS) adaptive filter. The proposed preconditioner is approximated from a priori knowledge of the factors of input correlation matrix with an incomplete strategy, motivated by the sparsity patter of the upper triangular factor in the QRD-RLS algorithm. The convergence properties of IPLMS algorithm are comparable with those of transform domain LMS(TDLMS) algorithm. Simulation results show efficiency and robustness of the proposed algorithm with reduced computational complexity.

An Examination of the Factors Influencing Software Development Effort

Effective evaluation of software development effort is an important aspect of successful project management. Based on a large database with 4106 projects ever developed, this study statistically examines the factors that influence development effort. The factors found to be significant for effort are project size, average number of developers that worked on the project, type of development, development language, development platform, and the use of rapid application development. Among these factors, project size is the most critical cost driver. Unsurprisingly, this study found that the use of CASE tools does not necessarily reduce development effort, which adds support to the claim that the use of tools is subtle. As many of the current estimation models are rarely or unsuccessfully used, this study proposes a parsimonious parametric model for the prediction of effort which is both simple and more accurate than previous models.

Approximate Solutions to Large Stein Matrix Equations

In the present paper, we propose numerical methods for solving the Stein equation AXC - X - D = 0 where the matrix A is large and sparse. Such problems appear in discrete-time control problems, filtering and image restoration. We consider the case where the matrix D is of full rank and the case where D is factored as a product of two matrices. The proposed methods are Krylov subspace methods based on the block Arnoldi algorithm. We give theoretical results and we report some numerical experiments.

Restarted GMRES Method Augmented with the Combination of Harmonic Ritz Vectors and Error Approximations

Restarted GMRES methods augmented with approximate eigenvectors are widely used for solving large sparse linear systems. Recently a new scheme of augmenting with error approximations is proposed. The main aim of this paper is to develop a restarted GMRES method augmented with the combination of harmonic Ritz vectors and error approximations. We demonstrate that the resulted combination method can gain the advantages of two approaches: (i) effectively deflate the small eigenvalues in magnitude that may hamper the convergence of the method and (ii) partially recover the global optimality lost due to restarting. The effectiveness and efficiency of the new method are demonstrated through various numerical examples.

Feature Selection Methods for an Improved SVM Classifier

Text categorization is the problem of classifying text documents into a set of predefined classes. After a preprocessing step, the documents are typically represented as large sparse vectors. When training classifiers on large collections of documents, both the time and memory restrictions can be quite prohibitive. This justifies the application of feature selection methods to reduce the dimensionality of the document-representation vector. In this paper, three feature selection methods are evaluated: Random Selection, Information Gain (IG) and Support Vector Machine feature selection (called SVM_FS). We show that the best results were obtained with SVM_FS method for a relatively small dimension of the feature vector. Also we present a novel method to better correlate SVM kernel-s parameters (Polynomial or Gaussian kernel).

A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques

In this paper a combined feature selection method is proposed which takes advantages of sample domain filtering, resampling and feature subset evaluation methods to reduce dimensions of huge datasets and select reliable features. This method utilizes both feature space and sample domain to improve the process of feature selection and uses a combination of Chi squared with Consistency attribute evaluation methods to seek reliable features. This method consists of two phases. The first phase filters and resamples the sample domain and the second phase adopts a hybrid procedure to find the optimal feature space by applying Chi squared, Consistency subset evaluation methods and genetic search. Experiments on various sized datasets from UCI Repository of Machine Learning databases show that the performance of five classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First Decision Tree and JRIP) improves simultaneously and the classification error for these classifiers decreases considerably. The experiments also show that this method outperforms other feature selection methods.

Enhanced Character Based Algorithm for Small Parsimony

Phylogenetic tree is a graphical representation of the evolutionary relationship among three or more genes or organisms. These trees show relatedness of data sets, species or genes divergence time and nature of their common ancestors. Quality of a phylogenetic tree requires parsimony criterion. Various approaches have been proposed for constructing most parsimonious trees. This paper is concerned about calculating and optimizing the changes of state that are needed called Small Parsimony Algorithms. This paper has proposed enhanced small parsimony algorithm to give better score based on number of evolutionary changes needed to produce the observed sequence changes tree and also give the ancestor of the given input.

High Performance Computing Using Out-of- Core Sparse Direct Solvers

In-core memory requirement is a bottleneck in solving large three dimensional Navier-Stokes finite element problem formulations using sparse direct solvers. Out-of-core solution strategy is a viable alternative to reduce the in-core memory requirements while solving large scale problems. This study evaluates the performance of various out-of-core sequential solvers based on multifrontal or supernodal techniques in the context of finite element formulations for three dimensional problems on a Windows platform. Here three different solvers, HSL_MA78, MUMPS and PARDISO are compared. The performance of these solvers is evaluated on a 64-bit machine with 16GB RAM for finite element formulation of flow through a rectangular channel. It is observed that using out-of-core PARDISO solver, relatively large problems can be solved. The implementation of Newton and modified Newton's iteration is also discussed.

PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Performance Analysis of HSDPA Systems using Low-Density Parity-Check (LDPC)Coding as Compared to Turbo Coding

HSDPA is a new feature which is introduced in Release-5 specifications of the 3GPP WCDMA/UTRA standard to realize higher speed data rate together with lower round-trip times. Moreover, the HSDPA concept offers outstanding improvement of packet throughput and also significantly reduces the packet call transfer delay as compared to Release -99 DSCH. Till now the HSDPA system uses turbo coding which is the best coding technique to achieve the Shannon limit. However, the main drawbacks of turbo coding are high decoding complexity and high latency which makes it unsuitable for some applications like satellite communications, since the transmission distance itself introduces latency due to limited speed of light. Hence in this paper it is proposed to use LDPC coding in place of Turbo coding for HSDPA system which decreases the latency and decoding complexity. But LDPC coding increases the Encoding complexity. Though the complexity of transmitter increases at NodeB, the End user is at an advantage in terms of receiver complexity and Bit- error rate. In this paper LDPC Encoder is implemented using “sparse parity check matrix" H to generate a codeword at Encoder and “Belief Propagation algorithm "for LDPC decoding .Simulation results shows that in LDPC coding the BER suddenly drops as the number of iterations increase with a small increase in Eb/No. Which is not possible in Turbo coding. Also same BER was achieved using less number of iterations and hence the latency and receiver complexity has decreased for LDPC coding. HSDPA increases the downlink data rate within a cell to a theoretical maximum of 14Mbps, with 2Mbps on the uplink. The changes that HSDPA enables includes better quality, more reliable and more robust data services. In other words, while realistic data rates are only a few Mbps, the actual quality and number of users achieved will improve significantly.