Abstract: The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.
Abstract: Sentiment analysis (SA) has received growing
attention in Arabic language research. However, few studies have yet
to directly apply SA to Arabic due to lack of a publicly available
dataset for this language. This paper partially bridges this gap due to
its focus on one of the Arabic dialects which is the Saudi dialect. This
paper presents annotated data set of 4700 for Saudi dialect sentiment
analysis with (K= 0.807). Our next work is to extend this corpus and
creation a large-scale lexicon for Saudi dialect from the corpus.
Abstract: Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques for extracting
phrases and stemming words. Then an ontology based conceptual
method will be used to annotate documents and expand the query.
To expand a query the spread activation algorithm is improved so
that the expansion can be done in various aspects. The annotated
documents and the expanded query will be processed to compute
the relevance degree exploiting statistical methods. The outstanding
features of our approach are (1) combining conceptual, statistical
and linguistic features of documents, (2) expanding the query with
its related concepts before comparing to documents, (3) extracting
and using both words and phrases to compute relevance degree, (4)
improving the spread activation algorithm to do the expansion based
on weighted combination of different conceptual relationships and
(5) allowing variable document vector dimensions. A ranking
system called ORank is developed to implement and test the
proposed model. The test results will be included at the end of the
paper.
Abstract: The standard investigational method for obstructive
sleep apnea syndrome (OSAS) diagnosis is polysomnography (PSG),
which consists of a simultaneous, usually overnight recording of
multiple electro-physiological signals related to sleep and
wakefulness. This is an expensive, encumbering and not a readily
repeated protocol, and therefore there is need for simpler and easily
implemented screening and detection techniques. Identification of
apnea/hypopnea events in the screening recordings is the key factor
for the diagnosis of OSAS. The analysis of a solely single-lead
electrocardiographic (ECG) signal for OSAS diagnosis, which may
be done with portable devices, at patient-s home, is the challenge of
the last years. A novel artificial neural network (ANN) based
approach for feature extraction and automatic identification of
respiratory events in ECG signals is presented in this paper. A
nonlinear principal component analysis (NLPCA) method was
considered for feature extraction and support vector machine for
classification/recognition. An alternative representation of the
respiratory events by means of Kohonen type neural network is
discussed. Our prospective study was based on OSAS patients of the
Clinical Hospital of Pneumology from Iaşi, Romania, males and
females, as well as on non-OSAS investigated human subjects. Our
computed analysis includes a learning phase based on cross signal
PSG annotation.
Abstract: In this paper, we present an approach for soccer video
edition using a multimodal annotation. We propose to associate with
each video sequence of a soccer match a textual document to be used
for further exploitation like search, browsing and abstract edition.
The textual document contains video meta data, match meta data, and
match data. This document, generated automatically while the video
is analyzed, segmented and classified, can be enriched semi
automatically according to the user type and/or a specialized
recommendation system.
Abstract: Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques to extract phrases
from documents and the query and doing stemming on words. Then
an ontology based conceptual method will be used to annotate
documents and expand the query. To expand a query the spread
activation algorithm is improved so that the expansion can be done
flexible and in various aspects. The annotated documents and the
expanded query will be processed to compute the relevance degree
exploiting statistical methods. The outstanding features of our
approach are (1) combining conceptual, statistical and linguistic
features of documents, (2) expanding the query with its related
concepts before comparing to documents, (3) extracting and using
both words and phrases to compute relevance degree, (4) improving
the spread activation algorithm to do the expansion based on
weighted combination of different conceptual relationships and (5)
allowing variable document vector dimensions. A ranking system
called ORank is developed to implement and test the proposed
model. The test results will be included at the end of the paper.
Abstract: UML is a collection of notations for capturing a software system specification. These notations have a specific syntax defined by the Object Management Group (OMG), but many of their constructs only present informal semantics. They are primarily graphical, with textual annotation. The inadequacies of standard UML as a vehicle for complete specification and implementation of real-time embedded systems has led to a variety of competing and complementary proposals. The Real-time UML profile (UML-RT), developed and standardized by OMG, defines a unified framework to express the time, scheduling and performance aspects of a system. We present in this paper a framework approach aimed at deriving a complete specification of a real-time system. Therefore, we combine two methods, a semiformal one, UML-RT, which allows the visual modeling of a realtime system and a formal one, CSP+T, which is a design language including the specification of real-time requirements. As to show the applicability of the approach, a correct design of a real-time system with hard real time constraints by applying a set of mapping rules is obtained.
Abstract: This paper applies Bayesian Networks to support
information extraction from unstructured, ungrammatical, and
incoherent data sources for semantic annotation. A tool has been
developed that combines ontologies, machine learning, and
information extraction and probabilistic reasoning techniques to
support the extraction process. Data acquisition is performed with the
aid of knowledge specified in the form of ontology. Due to the
variable size of information available on different data sources, it is
often the case that the extracted data contains missing values for
certain variables of interest. It is desirable in such situations to
predict the missing values. The methodology, presented in this paper,
first learns a Bayesian network from the training data and then uses it
to predict missing data and to resolve conflicts. Experiments have
been conducted to analyze the performance of the presented
methodology. The results look promising as the methodology
achieves high degree of precision and recall for information
extraction and reasonably good accuracy for predicting missing
values.