Selecting Negative Examples for Protein-Protein Interaction

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Probability and Instruction Effects in Syllogistic Conditional Reasoning

The main aim of this study was to examine whether people understand indicative conditionals on the basis of syntactic factors or on the basis of subjective conditional probability. The second aim was to investigate whether the conditional probability of q given p depends on the antecedent and consequent sizes or derives from inductive processes leading to establish a link of plausible cooccurrence between events semantically or experientially associated. These competing hypotheses have been tested through a 3 x 2 x 2 x 2 mixed design involving the manipulation of four variables: type of instructions (“Consider the following statement to be true", “Read the following statement" and condition with no conditional statement); antecedent size (high/low); consequent size (high/low); statement probability (high/low). The first variable was between-subjects, the others were within-subjects. The inferences investigated were Modus Ponens and Modus Tollens. Ninety undergraduates of the Second University of Naples, without any prior knowledge of logic or conditional reasoning, participated in this study. Results suggest that people understand conditionals in a syntactic way rather than in a probabilistic way, even though the perception of the conditional probability of q given p is at least partially involved in the conditionals- comprehension. They also showed that, in presence of a conditional syllogism, inferences are not affected by the antecedent or consequent sizes. From a theoretical point of view these findings suggest that it would be inappropriate to abandon the idea that conditionals are naturally understood in a syntactic way for the idea that they are understood in a probabilistic way.

Acquiring Contour Following Behaviour in Robotics through Q-Learning and Image-based States

In this work a visual and reactive contour following behaviour is learned by reinforcement. With artificial vision the environment is perceived in 3D, and it is possible to avoid obstacles that are invisible to other sensors that are more common in mobile robotics. Reinforcement learning reduces the need for intervention in behaviour design, and simplifies its adjustment to the environment, the robot and the task. In order to facilitate its generalisation to other behaviours and to reduce the role of the designer, we propose a regular image-based codification of states. Even though this is much more difficult, our implementation converges and is robust. Results are presented with a Pioneer 2 AT on a Gazebo 3D simulator.

People Counting in Transport Vehicles

Counting people from a video stream in a noisy environment is a challenging task. This project aims at developing a counting system for transport vehicles, integrated in a video surveillance product. This article presents a method for the detection and tracking of multiple faces in a video by using a model of first and second order local moments. An iterative process is used to estimate the position and shape of multiple faces in images, and to track them. the trajectories are then processed to count people entering and leaving the vehicle.

A Selective Markovianity Approach for Image Segmentation

A new Markovianity approach is introduced in this paper. This approach reduces the response time of classic Markov Random Fields approach. First, one region is determinated by a clustering technique. Then, this region is excluded from the study. The remaining pixel form the study zone and they are selected for a Markovianity segmentation task. With Selective Markovianity approach, segmentation process is faster than classic one.

Artificial Intelligence Techniques applied to Biomedical Patterns

Pattern recognition is the research area of Artificial Intelligence that studies the operation and design of systems that recognize patterns in the data. Important application areas are image analysis, character recognition, fingerprint classification, speech analysis, DNA sequence identification, man and machine diagnostics, person identification and industrial inspection. The interest in improving the classification systems of data analysis is independent from the context of applications. In fact, in many studies it is often the case to have to recognize and to distinguish groups of various objects, which requires the need for valid instruments capable to perform this task. The objective of this article is to show several methodologies of Artificial Intelligence for data classification applied to biomedical patterns. In particular, this work deals with the realization of a Computer-Aided Detection system (CADe) that is able to assist the radiologist in identifying types of mammary tumor lesions. As an additional biomedical application of the classification systems, we present a study conducted on blood samples which shows how these methods may help to distinguish between carriers of Thalassemia (or Mediterranean Anaemia) and healthy subjects.

An Improved Resource Discovery Approach Using P2P Model for Condor: A Grid Middleware

Resource Discovery in Grids is critical for efficient resource allocation and management. Heterogeneous nature and dynamic availability of resources make resource discovery a challenging task. As numbers of nodes are increasing from tens to thousands, scalability is essentially desired. Peer-to-Peer (P2P) techniques, on the other hand, provide effective implementation of scalable services and applications. In this paper we propose a model for resource discovery in Condor Middleware by using the four axis framework defined in P2P approach. The proposed model enhances Condor to incorporate functionality of a P2P system, thus aim to make Condor more scalable, flexible, reliable and robust.

Dynamic Visualization on Student's Performance, Retention and Transfer of Procedural Learning

This study examined the effects of two dynamic visualizations on 60 Malaysian primary school student-s performance (time on task), retention and transference. The independent variables in this study were the two dynamic visualizations, the video and the animated instructions. The dependent variables were the gain score of performance, retention and transference. The results showed that the students in the animation group significantly outperformed the students in the video group in retention. There were no significant differences in terms of gain scores in the performance and transference among the animation and the video groups, although the scores were slightly higher in the animation group compared to the video group. The conclusion of this study is that the animation visualization is superior compared to the video in the retention for a procedural task.

Learning an Overcomplete Dictionary using a Cauchy Mixture Model for Sparse Decay

An algorithm for learning an overcomplete dictionary using a Cauchy mixture model for sparse decomposition of an underdetermined mixing system is introduced. The mixture density function is derived from a ratio sample of the observed mixture signals where 1) there are at least two but not necessarily more mixture signals observed, 2) the source signals are statistically independent and 3) the sources are sparse. The basis vectors of the dictionary are learned via the optimization of the location parameters of the Cauchy mixture components, which is shown to be more accurate and robust than the conventional data mining methods usually employed for this task. Using a well known sparse decomposition algorithm, we extract three speech signals from two mixtures based on the estimated dictionary. Further tests with additive Gaussian noise are used to demonstrate the proposed algorithm-s robustness to outliers.

Expert System for Sintering Process Control based on the Information about solid-fuel Flow Composition

Usually, the solid-fuel flow of an iron ore sinter plant consists of different types of the solid-fuels, which differ from each other. Information about the composition of the solid-fuel flow usually comes every 8-24 hours. It can be clearly seen that this information cannot be used to control the sintering process in real time. Due to this, we propose an expert system which uses indirect measurements from the process in order to obtain the composition of the solid-fuel flow by solving an optimization task. Then this information can be used to control the sintering process. The proposed technique can be successfully used to improve sinter quality and reduce the amount of solid-fuel used by the process.

Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems

Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.

Input Textural Feature Selection By Mutual Information For Multispectral Image Classification

Texture information plays increasingly an important role in remotely sensed imagery classification and many pattern recognition applications. However, the selection of relevant textural features to improve this classification accuracy is not a straightforward task. This work investigates the effectiveness of two Mutual Information Feature Selector (MIFS) algorithms to select salient textural features that contain highly discriminatory information for multispectral imagery classification. The input candidate features are extracted from a SPOT High Resolution Visible(HRV) image using Wavelet Transform (WT) at levels (l = 1,2). The experimental results show that the selected textural features according to MIFS algorithms make the largest contribution to improve the classification accuracy than classical approaches such as Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA).

A Graphical Environment for Petri Nets INA Tool Based on Meta-Modelling and Graph Grammars

The Petri net tool INA is a well known tool by the Petri net community. However, it lacks a graphical environment to cerate and analyse INA models. Building a modelling tool for the design and analysis from scratch (for INA tool for example) is generally a prohibitive task. Meta-Modelling approach is useful to deal with such problems since it allows the modelling of the formalisms themselves. In this paper, we propose an approach based on the combined use of Meta-modelling and Graph Grammars to automatically generate a visual modelling tool for INA for analysis purposes. In our approach, the UML Class diagram formalism is used to define a meta-model of INA models. The meta-modelling tool ATOM3 is used to generate a visual modelling tool according to the proposed INA meta-model. We have also proposed a graph grammar to automatically generate INA description of the graphically specified Petri net models. This allows the user to avoid the errors when this description is done manually. Then the INA tool is used to perform the simulation and the analysis of the resulted INA description. Our environment is illustrated through an example.

Lateral Crushing of Square and Rectangular Metallic Tubes under Different Quasi-Static Conditions

Impact is one of very important subjects which always have been considered in mechanical science. Nature of impact is such that which makes its control a hard task. Therefore it is required to present the transfer of impact to other vulnerable part of a structure, when it is necessary, one of the best method of absorbing energy of impact, is by using Thin-walled tubes these tubes collapses under impact and with absorption of energy, it prevents the damage to other parts.Purpose of recent study is to survey the deformation and energy absorption of tubes with different type of cross section (rectangular or square) and with similar volumes, height, mean cross section thickness, and material under loading with different speeds. Lateral loading of tubes are quasi-static type and beside as numerical analysis, also experimental experiences has been performed to evaluate the accuracy of the results. Results from the surveys is indicates that in a same conditions which mentioned above, samples with square cross section ,absorb more energy compare to rectangular cross section, and also by increscent in speed of loading, energy absorption would be more.

A Performance Appraisal of Neural Networks Developed for Response Prediction across Heterogeneous Domains

Deciding the numerous parameters involved in designing a competent artificial neural network is a complicated task. The existence of several options for selecting an appropriate architecture for neural network adds to this complexity, especially when different applications of heterogeneous natures are concerned. Two completely different applications in engineering and medical science were selected in the present study including prediction of workpiece's surface roughness in ultrasonic-vibration assisted turning and papilloma viruses oncogenicity. Several neural network architectures with different parameters were developed for each application and the results were compared. It was illustrated in this paper that some applications such as the first one mentioned above are apt to be modeled by a single network with sufficient accuracy, whereas others such as the second application can be best modeled by different expert networks for different ranges of output. Development of knowledge about the essentials of neural networks for different applications is regarded as the cornerstone of multidisciplinary network design programs to be developed as a means of reducing inconsistencies and the burden of the user intervention.

Discovering Complex Regularities: from Tree to Semi-Lattice Classifications

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optimize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is able to automatically suggest a strategy to optimize the number of classes optimization, but also support both tree classifications and semi-lattice organizations of the classes to give to the users the possibility of passing from one class to the ones with which it has some aspects in common. Examples of using tree and semi-lattice classifications are given to illustrate advantages and problems. The tool is applied to classify macroeconomic data that report the most developed countries- import and export. It is possible to classify the countries based on their economic behaviour and use the tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation. Possible interrelationships between the classes and their meaning are also discussed.

Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.

Feature Subset Selection Using Ant Colony Optimization

Feature selection is an important step in many pattern classification problems. It is applied to select a subset of features, from a much larger set, such that the selected subset is sufficient to perform the classification task. Due to its importance, the problem of feature selection has been investigated by many researchers. In this paper, a novel feature subset search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.

EEG Indices to Time-On-Task Effects and to a Workload Manipulation (Cueing)

The aim of this study was to evaluate the sensitivity of a range of EEG indices to time-on-task effects and to a workload manipulation (cueing), during performance of a resource-limited vigilance task. Effects of task period and cueing on performance and subjective state response were consistent with previous vigilance studies and with resource theory. Two EEG indices – the Task Load Index (TLI) and global lower frequency (LF) alpha power – showed effects of task period and cueing similar to those seen with correct detections. Across four successive task periods, the TLI declined and LF alpha power increased. Cueing increased TLI and decreased LF alpha. Other indices – the Engagement Index (EI), frontal theta and upper frequency (UF) alpha failed to show these effects. However, EI and frontal theta were sensitive to interactive effects of task period and cueing, which may correspond to a stronger anxiety response to the uncued task.

Prediction of Slump in Concrete using Artificial Neural Networks

High Strength Concrete (HSC) is defined as concrete that meets special combination of performance and uniformity requirements that cannot be achieved routinely using conventional constituents and normal mixing, placing, and curing procedures. It is a highly complex material, which makes modeling its behavior a very difficult task. This paper aimed to show possible applicability of Neural Networks (NN) to predict the slump in High Strength Concrete (HSC). Neural Network models is constructed, trained and tested using the available test data of 349 different concrete mix designs of High Strength Concrete (HSC) gathered from a particular Ready Mix Concrete (RMC) batching plant. The most versatile Neural Network model is selected to predict the slump in concrete. The data used in the Neural Network models are arranged in a format of eight input parameters that cover the Cement, Fly Ash, Sand, Coarse Aggregate (10 mm), Coarse Aggregate (20 mm), Water, Super-Plasticizer and Water/Binder ratio. Furthermore, to test the accuracy for predicting slump in concrete, the final selected model is further used to test the data of 40 different concrete mix designs of High Strength Concrete (HSC) taken from the other batching plant. The results are compared on the basis of error function (or performance function).