A Survey of Semantic Integration Approaches in Bioinformatics

Technological advances of computer science and data analysis are helping to provide continuously huge volumes of biological data, which are available on the web. Such advances involve and require powerful techniques for data integration to extract pertinent knowledge and information for a specific question. Biomedical exploration of these big data often requires the use of complex queries across multiple autonomous, heterogeneous and distributed data sources. Semantic integration is an active area of research in several disciplines, such as databases, information-integration, and ontology. We provide a survey of some approaches and techniques for integrating biological data, we focus on those developed in the ontology community.

A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation

Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.

A Proposed Approach for Emotion Lexicon Enrichment

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

A Study on Bilingual Semantic Processing: Category Effects and Age Effects

The present study addressed the nature of bilingual semantic processing in Mandarin Chinese and Southern Min and examined category effects and age effects. Nineteen bilingual adults of Mandarin Chinese and Southern Min, nine monolingual seniors of Mandarin Chinese, and ten monolingual seniors of Southern Min in Taiwan individually completed two semantic tasks: Picture naming and category fluency tasks. The instruments for the naming task were sixty black-and-white pictures, including thirty-five object pictures and twenty-five action pictures. The category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). The reaction time for each picture/question was additionally calculated and analyzed. Oral productions in Mandarin Chinese and in Southern Min were compared and discussed to examine the category effects and age effects. The results of the category fluency task indicated that the content of information of these seniors was comparatively deteriorated, and thus they produced a smaller number of semantic-lexical items. Significant group differences were also found in the reaction time results. Category effects were significant for both adults and seniors in the semantic fluency task. The findings of the present study will help characterize the nature of the bilingual semantic processing of adults and seniors, and contribute to the fields of contrastive and corpus linguistics.

A State-Of-The-Art Review on Web Services Adaptation

Web service adaptation involves the creation of adapters that solve Web services incompatibilities known as mismatches. Since the importance of Web services adaptation is increasing because of the frequent implementation and use of online Web services, this paper presents a literature review of web services to investigate the main methods of adaptation, their theoretical underpinnings and the metrics used to measure adapters performance. Eighteen publications were reviewed independently by two researchers. We found that adaptation techniques are needed to solve different types of problems that may arise due to incompatibilities in Web service interfaces, including protocols, messages, data and semantics that affect the interoperability of the services. Although adapters are non-invasive methods that can improve Web services interoperability and there are current approaches for service adaptation; there is, however, not yet one solution that fits all types of mismatches. Our results also show that only a few research projects incorporate theoretical frameworks and that metrics to measure adapters’ performance are very limited. We conclude that further research on software adaptation should improve current adaptation methods in different layers of the service interoperability and that an adaptation theoretical framework that incorporates a theoretical underpinning and measures of qualitative and quantitative performance needs to be created.

Resources-Based Ontology Matching to Access Learning Resources

Nowadays, ontologies are used for achieving a common understanding within a user community and for sharing domain knowledge. However, the de-centralized nature of the web makes indeed inevitable that small communities will use their own ontologies to describe their data and to index their own resources. Certainly, accessing to resources from various ontologies created independently is an important challenge for answering end user queries. Ontology mapping is thus required for combining ontologies. However, mapping complete ontologies at run time is a computationally expensive task. This paper proposes a system in which mappings between concepts may be generated dynamically as the concepts are encountered during user queries. In this way, the interaction itself defines the context in which small and relevant portions of ontologies are mapped. We illustrate application of the proposed system in the context of Technology Enhanced Learning (TEL) where learners need to access to learning resources covering specific concepts.

Ontology-Based Approach for Temporal Semantic Modeling of Social Networks

Social networks have recently gained a growing interest on the web. Traditional formalisms for representing social networks are static and suffer from the lack of semantics. In this paper, we will show how semantic web technologies can be used to model social data. The SemTemp ontology aligns and extends existing ontologies such as FOAF, SIOC, SKOS and OWL-Time to provide a temporal and semantically rich description of social data. We also present a modeling scenario to illustrate how our ontology can be used to model social networks.

Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script, which is a series of texts including directions and dialogues. The other is blogposts, which possesses relatively abstracted contents, stories, and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. When unseen words appear, it needs a method to reflect to existing topic. In this paper, we introduce a semantic vocabulary expansion method to reflect unseen words. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can discover more salient topics for broadcasting contents.

Opponent Color and Curvelet Transform Based Image Retrieval System Using Genetic Algorithm

In order to retrieve images efficiently from a large database, a unique method integrating color and texture features using genetic programming has been proposed. Opponent color histogram which gives shadow, shade, and light intensity invariant property is employed in the proposed framework for extracting color features. For texture feature extraction, fast discrete curvelet transform which captures more orientation information at different scales is incorporated to represent curved like edges. The recent scenario in the issues of image retrieval is to reduce the semantic gap between user’s preference and low level features. To address this concern, genetic algorithm combined with relevance feedback is embedded to reduce semantic gap and retrieve user’s preference images. Extensive and comparative experiments have been conducted to evaluate proposed framework for content based image retrieval on two databases, i.e., COIL-100 and Corel-1000. Experimental results clearly show that the proposed system surpassed other existing systems in terms of precision and recall. The proposed work achieves highest performance with average precision of 88.2% on COIL-100 and 76.3% on Corel, the average recall of 69.9% on COIL and 76.3% on Corel. Thus, the experimental results confirm that the proposed content based image retrieval system architecture attains better solution for image retrieval.

Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars, and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Schema and Data Migration of a Relational Database RDB to the Extensible Markup Language XML

This article discusses the passage of RDB to XML documents (schema and data) based on metadata and semantic enrichment, which makes the RDB under flattened shape and is enriched by the object concept. The integration and exploitation of the object concept in the XML uses a syntax allowing for the verification of the conformity of the document XML during the creation. The information extracted from the RDB is therefore analyzed and filtered in order to adjust according to the structure of the XML files and the associated object model. Those implemented in the XML document through a SQL query are built dynamically. A prototype was implemented to realize automatic migration, and so proves the effectiveness of this particular approach.

Ontology for Semantic Enrichment of Radio Frequency Identification Systems

Radio Frequency Identification (RFID) has become a key technology in the emerging concept of Internet of Things (IoT). Naturally, business applications would require the deployment of various RFID systems developed by different vendors that use different data formats and structures. This heterogeneity poses a challenge in developing real-life IoT systems with RFID, as integration is becoming very complex and challenging. Semantic integration is a key approach to deal with this challenge. To do so, ontology for RFID systems need to be developed in order to annotated semantically RFID systems, and hence, facilitate their integration. Accordingly, in this paper, we propose ontology for RFID systems. The proposed ontology can be used to semantically enrich RFID systems, and hence, improve their usage and reasoning.

An Approach to Integrate Ontologies of Open Educational Resources in Knowledge Based Management Systems

There are real needs to integrate types of Open Educational Resources (OER) with an intelligent system to extract information and knowledge in the semantic searching level. The needs came because most of current learning standard adopted web based learning and the e-learning systems do not always serve all educational goals. Semantic Web systems provide educators, students, and researchers with intelligent queries based on a semantic knowledge management learning system. An ontology-based learning system is an advanced system, where ontology plays the core of the semantic web in a smart learning environment. The objective of this paper is to discuss the potentials of ontologies and mapping different kinds of ontologies; heterogeneous or homogenous to manage and control different types of Open Educational Resources. The important contribution of this research is that it uses logical rules and conceptual relations to map between ontologies of different educational resources. We expect from this methodology to establish an intelligent educational system supporting student tutoring, self and lifelong learning system.

Secure Bio Semantic Computing Scheme

In this paper, the secure BioSemantic Scheme is presented to bridge biological/biomedical research problems and computational solutions via semantic computing. Due to the diversity of problems in various research fields, the semantic capability description language (SCDL) plays and important role as a common language and generic form for problem formalization. SCDL is expected the essential for future semantic and logical computing in Biosemantic field. We show several example to Biomedical problems in this paper. Moreover, in the coming age of cloud computing, the security problem is considered to be crucial issue and we presented a practical scheme to cope with this problem.

A Framework for Semantics Preserving SPARQL-to-SQL Translation

The enormous amount of information stored on the web increases from one day to the next, exposing the web currently faced with the inevitable difficulties of research pertinent information that users really want. The problem today is not limited to expanding the size of the information highways, but to design a system for intelligent search. The vast majority of this information is stored in relational databases, which in turn represent a backend for managing RDF data of the semantic web. This problem has motivated us to write this paper in order to establish an effective approach to support semantic transformation algorithm for SPARQL queries to SQL queries, more precisely SPARQL SELECT queries; by adopting this method, the relational database can be questioned easily with SPARQL queries maintaining the same performance.

An Automatic Model Transformation Methodology Based on Semantic and Syntactic Comparisons and the Granularity Issue Involved

Model transformation, as a pivotal aspect of Modeldriven engineering, attracts more and more attentions both from researchers and practitioners. Many domains (enterprise engineering, software engineering, knowledge engineering, etc.) use model transformation principles and practices to serve to their domain specific problems; furthermore, model transformation could also be used to fulfill the gap between different domains: by sharing and exchanging knowledge. Since model transformation has been widely used, there comes new requirement on it: effectively and efficiently define the transformation process and reduce manual effort that involved in. This paper presents an automatic model transformation methodology based on semantic and syntactic comparisons, and focuses particularly on granularity issue that existed in transformation process. Comparing to the traditional model transformation methodologies, this methodology serves to a general purpose: crossdomain methodology. Semantic and syntactic checking measurements are combined into a refined transformation process, which solves the granularity issue. Moreover, semantic and syntactic comparisons are supported by software tool; manual effort is replaced in this way.

A Validation Technique for Integrated Ontologies

Ontology validation is an important part of web applications’ development, where knowledge integration and ontological reasoning play a fundamental role. It aims to ensure the consistency and correctness of ontological knowledge and to guarantee that ontological reasoning is carried out in a meaningful way. Existing approaches to ontology validation address more or less specific validation issues, but the overall process of validating web ontologies has not been formally established yet. As the size and the number of web ontologies continue to grow, more web applications’ developers will rely on the existing repository of ontologies rather than develop ontologies from scratch. If an application utilizes multiple independently created ontologies, their consistency must be validated and eventually adjusted to ensure proper interoperability between them. This paper presents a validation technique intended to test the consistency of independent ontologies utilized by a common application.

Semantic Indexing Approach of a Corpora Based On Ontology

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.