Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.
Abstract: Semantic Web services will enable the semiautomatic
and automatic annotation, advertisement, discovery,
selection, composition, and execution of inter-organization business
logic, making the Internet become a common global platform where
organizations and individuals communicate with each other to carry
out various commercial activities and to provide value-added
services. There is a growing consensus that Web services alone will
not be sufficient to develop valuable solutions due the degree of
heterogeneity, autonomy, and distribution of the Web. This paper
deals with two of the hottest R&D and technology areas currently
associated with the Web – Web services and the Semantic Web. It
presents the synergies that can be created between Web Services and
Semantic Web technologies to provide a new generation of eservices.
Abstract: Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques to extract phrases
from documents and the query and doing stemming on words. Then
an ontology based conceptual method will be used to annotate
documents and expand the query. To expand a query the spread
activation algorithm is improved so that the expansion can be done
flexible and in various aspects. The annotated documents and the
expanded query will be processed to compute the relevance degree
exploiting statistical methods. The outstanding features of our
approach are (1) combining conceptual, statistical and linguistic
features of documents, (2) expanding the query with its related
concepts before comparing to documents, (3) extracting and using
both words and phrases to compute relevance degree, (4) improving
the spread activation algorithm to do the expansion based on
weighted combination of different conceptual relationships and (5)
allowing variable document vector dimensions. A ranking system
called ORank is developed to implement and test the proposed
model. The test results will be included at the end of the paper.
Abstract: Generally, administrative systems in an academic
environment are disjoint and support independent queries. The
objective in this work is to semantically connect these independent
systems to provide support to queries run on the integrated platform.
The proposed framework, by enriching educational material in the
legacy systems, provides a value-added semantics layer where
activities such as annotation, query and reasoning can be carried out
to support management requirements. We discuss the development of
this ontology framework with a case study of UAE University
program administration to show how semantic web technologies can
be used by administration to develop student profiles for better
academic program management.
Abstract: Understanding the cell's large-scale organization is an
interesting task in computational biology. Thus, protein-protein
interactions can reveal important organization and function of the
cell. Here, we investigated the correspondence between protein
interactions and function for the yeast. We obtained the correlations
among the set of proteins. Then these correlations are clustered using
both the hierarchical and biclustering methods. The detailed analyses
of proteins in each cluster were carried out by making use of their
functional annotations. As a result, we found that some functional
classes appear together in almost all biclusters. On the other hand, in
hierarchical clustering, the dominancy of one functional class is
observed. In brief, from interaction data to function, some correlated
results are noticed about the relationship between interaction and
function which might give clues about the organization of the
proteins.
Abstract: When programming in languages such as C, Java, etc.,
it is difficult to reconstruct the programmer's ideas only from the
program code. This occurs mainly because, much of the programmer's
ideas behind the implementation are not recorded in the code during
implementation. For example, physical aspects of computation such as
spatial structures, activities, and meaning of variables are not required
as instructions to the computer and are often excluded. This makes the
future reconstruction of the original ideas difficult. AIDA, which is a
multimedia programming language based on the cyberFilm model, can
solve these problems allowing to describe ideas behind programs
using advanced annotation methods as a natural extension to
programming. In this paper, a development environment that
implements the AIDA language is presented with a focus on the
annotation methods. In particular, an actual scientific numerical
computation code is created and the effects of the annotation methods
are analyzed.
Abstract: Machine-understandable data when strongly
interlinked constitutes the basis for the SemanticWeb. Annotating
web documents is one of the major techniques for creating metadata
on the Web. Annotating websitexs defines the containing data in a
form which is suitable for interpretation by machines. In this paper,
we present a better and improved approach than previous [1] to
annotate the texts of the websites depends on the knowledge base.
Abstract: Prediction of bacterial virulent protein sequences can
give assistance to identification and characterization of novel
virulence-associated factors and discover drug/vaccine targets against
proteins indispensable to pathogenicity. Gene Ontology (GO)
annotation which describes functions of genes and gene products as a
controlled vocabulary of terms has been shown effectively for a
variety of tasks such as gene expression study, GO annotation
prediction, protein subcellular localization, etc. In this study, we
propose a sequence-based method Virulent-GO by mining informative
GO terms as features for predicting bacterial virulent proteins.
Each protein in the datasets used by the existing method
VirulentPred is annotated by using BLAST to obtain its homologies
with known accession numbers for retrieving GO terms. After
investigating various popular classifiers using the same five-fold
cross-validation scheme, Virulent-GO using the single kind of GO
term features with an accuracy of 82.5% is slightly better than
VirulentPred with 81.8% using five kinds of sequence-based features.
For the evaluation of independent test, Virulent-GO also yields better
results (82.0%) than VirulentPred (80.7%). When evaluating single
kind of feature with SVM, the GO term feature performs much well,
compared with each of the five kinds of features.
Abstract: MATCH project [1] entitle the development of an
automatic diagnosis system that aims to support treatment of colon
cancer diseases by discovering mutations that occurs to tumour
suppressor genes (TSGs) and contributes to the development of
cancerous tumours. The constitution of the system is based on a)
colon cancer clinical data and b) biological information that will be
derived by data mining techniques from genomic and proteomic
sources The core mining module will consist of the popular, well
tested hybrid feature extraction methods, and new combined
algorithms, designed especially for the project. Elements of rough
sets, evolutionary computing, cluster analysis, self-organization maps
and association rules will be used to discover the annotations
between genes, and their influence on tumours [2]-[11].
The methods used to process the data have to address their high
complexity, potential inconsistency and problems of dealing with the
missing values. They must integrate all the useful information
necessary to solve the expert's question. For this purpose, the system
has to learn from data, or be able to interactively specify by a domain
specialist, the part of the knowledge structure it needs to answer a
given query. The program should also take into account the
importance/rank of the particular parts of data it analyses, and adjusts
the used algorithms accordingly.
Abstract: We have developed a database for membrane protein functions, which has more than 3000 experimental data on functionally important amino acid residues in membrane proteins along with sequence, structure and literature information. Further, we have proposed different methods for identifying membrane proteins based on their functions: (i) discrimination of membrane transport proteins from other globular and membrane proteins and classifying them into channels/pores, electrochemical and active transporters, and (ii) β-signal for the insertion of mitochondrial β-barrel outer membrane proteins and potential targets. Our method showed an accuracy of 82% in discriminating transport proteins and 68% to classify them into three different transporters. In addition, we have identified a motif for targeting β-signal and potential candidates for mitochondrial β-barrel membrane proteins. Our methods can be used as effective tools for genome-wide annotations.
Abstract: The third phase of web means semantic web requires many web pages which are annotated with metadata. Thus, a crucial question is where to acquire these metadata. In this paper we propose our approach, a semi-automatic method to annotate the texts of documents and web pages and employs with a quite comprehensive knowledge base to categorize instances with regard to ontology. The approach is evaluated against the manual annotations and one of the most popular annotation tools which works the same as our tool. The approach is implemented in .net framework and uses the WordNet for knowledge base, an annotation tool for the Semantic Web.
Abstract: UML is a collection of notations for capturing a software system specification. These notations have a specific syntax defined by the Object Management Group (OMG), but many of their constructs only present informal semantics. They are primarily graphical, with textual annotation. The inadequacies of standard UML as a vehicle for complete specification and implementation of real-time embedded systems has led to a variety of competing and complementary proposals. The Real-time UML profile (UML-RT), developed and standardized by OMG, defines a unified framework to express the time, scheduling and performance aspects of a system. We present in this paper a framework approach aimed at deriving a complete specification of a real-time system. Therefore, we combine two methods, a semiformal one, UML-RT, which allows the visual modeling of a realtime system and a formal one, CSP+T, which is a design language including the specification of real-time requirements. As to show the applicability of the approach, a correct design of a real-time system with hard real time constraints by applying a set of mapping rules is obtained.
Abstract: With the advent of emerging personal computing paradigms such as ubiquitous and mobile computing, Web contents are becoming accessible from a wide range of mobile devices. Since these devices do not have the same rendering capabilities, Web contents need to be adapted for transparent access from a variety of client agents. Such content adaptation is exploited for either an individual element or a set of consecutive elements in a Web document and results in better rendering and faster delivery to the client device. Nevertheless, Web content adaptation sets new challenges for semantic markup. This paper presents an advanced components platform, called SMC, enabling the development of mobility applications and services according to a channel model based on the principles of Services Oriented Architecture (SOA). It then goes on to describe the potential for integration with the Semantic Web through a novel framework of external semantic annotation that prescribes a scheme for representing semantic markup files and a way of associating Web documents with these external annotations. The role of semantic annotation in this framework is to describe the contents of individual documents themselves, assuring the preservation of the semantics during the process of adapting content rendering. Semantic Web content adaptation is a way of adding value to Web contents and facilitates repurposing of Web contents (enhanced browsing, Web Services location and access, etc).
Abstract: This paper applies Bayesian Networks to support
information extraction from unstructured, ungrammatical, and
incoherent data sources for semantic annotation. A tool has been
developed that combines ontologies, machine learning, and
information extraction and probabilistic reasoning techniques to
support the extraction process. Data acquisition is performed with the
aid of knowledge specified in the form of ontology. Due to the
variable size of information available on different data sources, it is
often the case that the extracted data contains missing values for
certain variables of interest. It is desirable in such situations to
predict the missing values. The methodology, presented in this paper,
first learns a Bayesian network from the training data and then uses it
to predict missing data and to resolve conflicts. Experiments have
been conducted to analyze the performance of the presented
methodology. The results look promising as the methodology
achieves high degree of precision and recall for information
extraction and reasonably good accuracy for predicting missing
values.