Protein Secondary Structure Prediction

Protein structure determination and prediction has been a focal research subject in the field of bioinformatics due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. However, prediction accuracies of these methods rarely exceed 70%.

Dimensionality Reduction of PSSM Matrix and its Influence on Secondary Structure and Relative Solvent Accessibility Predictions

State-of-the-art methods for secondary structure (Porter, Psi-PRED, SAM-T99sec, Sable) and solvent accessibility (Sable, ACCpro) predictions use evolutionary profiles represented by the position specific scoring matrix (PSSM). It has been demonstrated that evolutionary profiles are the most important features in the feature space for these predictions. Unfortunately applying PSSM matrix leads to high dimensional feature spaces that may create problems with parameter optimization and generalization. Several recently published suggested that applying feature extraction for the PSSM matrix may result in improvements in secondary structure predictions. However, none of the top performing methods considered here utilizes dimensionality reduction to improve generalization. In the present study, we used simple and fast methods for features selection (t-statistics, information gain) that allow us to decrease the dimensionality of PSSM matrix by 75% and improve generalization in the case of secondary structure prediction compared to the Sable server.

Predicting the Minimum Free Energy RNA Secondary Structures using Harmony Search Algorithm

The physical methods for RNA secondary structure prediction are time consuming and expensive, thus methods for computational prediction will be a proper alternative. Various algorithms have been used for RNA structure prediction including dynamic programming and metaheuristic algorithms. Musician's behaviorinspired harmony search is a recently developed metaheuristic algorithm which has been successful in a wide variety of complex optimization problems. This paper proposes a harmony search algorithm (HSRNAFold) to find RNA secondary structure with minimum free energy and similar to the native structure. HSRNAFold is compared with dynamic programming benchmark mfold and metaheuristic algorithms (RnaPredict, SetPSO and HelixPSO). The results showed that HSRNAFold is comparable to mfold and better than metaheuristics in finding the minimum free energies and the number of correct base pairs.

Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].