Abstract: The novelty proposed in this study is twofold and consists in the developing of a new color similarity metric based on the human visual system and a new color indexing based on a textual approach. The new color similarity metric proposed is based on the color perception of the human visual system. Consequently the results returned by the indexing system can fulfill as much as possibile the user expectations. We developed a web application to collect the users judgments about the similarities between colors, whose results are used to estimate the metric proposed in this study. In order to index the image's colors, we used a text indexing engine to facilitate the integration of visual features in a database of text documents. The textual signature is build by weighting the image's colors in according to their occurrence in the image. The use of a textual indexing engine, provide us a simple, fast and robust solution to index images. A typical usage of the system proposed in this study, is the development of applications whose data type is both visual and textual. In order to evaluate the proposed method we chose a price comparison engine as a case of study, collecting a series of commercial offers containing the textual description and the image representing a specific commercial offer.
Abstract: Abovepresented work deals with the new scope of application of information and communication technologies for the improvement of the election process in the biased environment. We are introducing a new concept of construction of the information-communication system for the election participant. It consists of four main components: Software, Physical Infrastructure, Structured Information and the Trained Stuff. The Structured Information is the bases of the whole system and is the collection of all possible events (irregularities among them) at the polling stations, which are structured in special templates, forms and integrated in mobile devices.The software represents a package of analytic modules, which operates with the dynamic database. The application of modern communication technologies facilities the immediate exchange of information and of relevant documents between the polling stations and the Server of the participant. No less important is the training of the staff for the proper functioning of the system. The e-training system with various modules should be applied in this respect. The presented methodology is primarily focused on the election processes in the countries of emerging democracies.It can be regarded as the tool for the monitoring of elections process by the political organization(s) and as one of the instruments to foster the spread of democracy in these countries.
Abstract: Most of the Question Answering systems
composed of three main modules: question processing,
document processing and answer processing. Question
processing module plays an important role in QA systems. If
this module doesn't work properly, it will make problems for
other sections. Moreover answer processing module is an
emerging topic in Question Answering, where these systems
are often required to rank and validate candidate answers.
These techniques aiming at finding short and precise answers
are often based on the semantic classification.
This paper discussed about a new model for question
answering which improved two main modules, question
processing and answer processing.
There are two important components which are the bases
of the question processing. First component is question
classification that specifies types of question and answer.
Second one is reformulation which converts the user's
question into an understandable question by QA system in a
specific domain. Answer processing module, consists of
candidate answer filtering, candidate answer ordering
components and also it has a validation section for interacting
with user. This module makes it more suitable to find exact
answer. In this paper we have described question and answer
processing modules with modeling, implementing and
evaluating the system. System implemented in two versions.
Results show that 'Version No.1' gave correct answer to 70%
of questions (30 correct answers to 50 asked questions) and
'version No.2' gave correct answers to 94% of questions (47
correct answers to 50 asked questions).
Abstract: Due to the tremendous amount of information provided
by the World Wide Web (WWW) developing methods for mining
the structure of web-based documents is of considerable interest. In
this paper we present a similarity measure for graphs representing
web-based hypertext structures. Our similarity measure is mainly
based on a novel representation of a graph as linear integer strings,
whose components represent structural properties of the graph. The
similarity of two graphs is then defined as the optimal alignment of
the underlying property strings. In this paper we apply the well known
technique of sequence alignments for solving a novel and challenging
problem: Measuring the structural similarity of generalized trees.
In other words: We first transform our graphs considered as high
dimensional objects in linear structures. Then we derive similarity
values from the alignments of the property strings in order to
measure the structural similarity of generalized trees. Hence, we
transform a graph similarity problem to a string similarity problem for
developing a efficient graph similarity measure. We demonstrate that
our similarity measure captures important structural information by
applying it to two different test sets consisting of graphs representing
web-based document structures.
Abstract: The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing online Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.
Abstract: In this study, a fuzzy similarity approach for Arabic
web pages classification is presented. The approach uses a fuzzy
term-category relation by manipulating membership degree for the
training data and the degree value for a test web page. Six measures
are used and compared in this study. These measures include:
Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and
Bounded Difference approaches. These measures are applied and
compared using 50 different Arabic web pages. Einstein measure was
gave best performance among the other measures. An analysis of
these measures and concluding remarks are drawn in this study.
Abstract: Text document categorization involves large amount
of data or features. The high dimensionality of features is a
troublesome and can affect the performance of the classification.
Therefore, feature selection is strongly considered as one of the
crucial part in text document categorization. Selecting the best
features to represent documents can reduce the dimensionality of
feature space hence increase the performance. There were many
approaches has been implemented by various researchers to
overcome this problem. This paper proposed a novel hybrid approach
for feature selection in text document categorization based on Ant
Colony Optimization (ACO) and Information Gain (IG). We also
presented state-of-the-art algorithms by several other researchers.
Abstract: In this paper, we present symbolic recognition models to extract knowledge characterized by document structures. Focussing on the extraction and the meticulous exploitation of the semantic structure of documents, we obtain a meaningful contextual tagging corresponding to different unit types (title, chapter, section, enumeration, etc.).
Abstract: Script identification is one of the challenging steps in the development of optical character recognition system for bilingual or multilingual documents. In this paper an attempt is made for identification of English numerals at word level from Punjabi documents by using Gabor features. The support vector machine (SVM) classifier with five fold cross validation is used to classify the word images. The results obtained are quite encouraging. Average accuracy with RBF kernel, Polynomial and Linear Kernel functions comes out to be greater than 99%.
Abstract: We have proposed an information filtering system
using index word selection from a document set based on the
topics included in a set of documents. This method narrows
down the particularly characteristic words in a document set
and the topics are obtained by Sparse Non-negative Matrix
Factorization. In information filtering, a document is often
represented with the vector in which the elements correspond
to the weight of the index words, and the dimension of the
vector becomes larger as the number of documents is
increased. Therefore, it is possible that useless words as index
words for the information filtering are included. In order to
address the problem, the dimension needs to be reduced. Our
proposal reduces the dimension by selecting index words
based on the topics included in a document set. We have
applied the Sparse Non-negative Matrix Factorization to the
document set to obtain these topics. The filtering is carried out
based on a centroid of the learning document set. The centroid
is regarded as the user-s interest. In addition, the centroid is
represented with a document vector whose elements consist of
the weight of the selected index words. Using the English test
collection MEDLINE, thus, we confirm the effectiveness of
our proposal. Hence, our proposed selection can confirm the
improvement of the recommendation accuracy from the other
previous methods when selecting the appropriate number of
index words. In addition, we discussed the selected index
words by our proposal and we found our proposal was able to
select the index words covered some minor topics included in
the document set.
Abstract: Text similarity measurement is a fundamental issue in
many textual applications such as document clustering, classification,
summarization and question answering. However, prevailing approaches
based on Vector Space Model (VSM) more or less suffer
from the limitation of Bag of Words (BOW), which ignores the semantic
relationship among words. Enriching document representation
with background knowledge from Wikipedia is proven to be an effective
way to solve this problem, but most existing methods still
cannot avoid similar flaws of BOW in a new vector space. In this
paper, we propose a novel text similarity measurement which goes
beyond VSM and can find semantic affinity between documents.
Specifically, it is a unified graph model that exploits Wikipedia as
background knowledge and synthesizes both document representation
and similarity computation. The experimental results on two different
datasets show that our approach significantly improves VSM-based
methods in both text clustering and classification.
Abstract: A variety of new technology-based services have
emerged with the development of Information and Communication
Technologies (ICTs). Since technology-based services have technology-driven characteristics, the identification of relationships
between technology-based services and ICTs would give meaningful implications. Thus, this paper proposes an approach for identifying the
relationships between technology-based services and ICTs by
analyzing patent documents. First, business model (BM) patents are
classified into relevant service categories. Second, patent citation
analysis is conducted to investigate the technological linkage and impacts between technology-based services and ICTs at macro level.
Third, as a micro level analysis, patent co-classification analysis is
employed to identify the technological linkage and coverage. The
proposed approach could guide and help managers and designers of
technology-based services to discover the opportunity of the development of new technology-based services in emerging service sectors.
Abstract: This main purpose of the study reported here was to
investigate the extent to which the form of school
governance (particularly decision-making) had an impact upon the
effectiveness of the school with reference to parental involvement,
planning and budgeting, professional development of teachers,
school facilities and resources, and student outcomes. Particular
attention was given to decision-making within the governance
arrangements. The study was based on four case studies of high
schools in New South Wales, Australia including one government
school, one independent Christian community school, one
independent Catholic school, and one Catholic systemic school.
The focus of the research was principals, teachers, parents, and
students of four schools with varying governance structures. To
gain a greater insight into the issues, the researchers collected
information by questionnaire, semi-structured interview, and
review of school key documents. This study found that it was not
so much structure but the centrality of the school Principal and the
way that the Principal perceived his/her roles in relation to others
that impacted most on school governance.
Abstract: Digital signature is a useful primitive to attain the integrity and authenticity in various wire or wireless communications. Proxy signature is one type of the digital signatures. It helps the proxy signer to sign messages on behalf of the original signer. It is very useful when the original signer (e.g. the president of a company) is not available to sign a specific document. If the original signer can not forge valid proxy signatures through impersonating the proxy signer, it will be robust in a virtual environment; thus the original signer can not shift any illegal action initiated by herself to the proxy signer. In this paper, we propose a new proxy signature scheme. The new scheme can prevent the original signer from impersonating the proxy signer to sign messages. The proposed scheme is based on the regular ElGamal signature. In addition, the fair privacy of the proxy signer is maintained. That means, the privacy of the proxy signer is preserved; and the privacy can be revealed when it is necessary.
Abstract: The information on the Web increases tremendously.
A number of search engines have been developed for searching Web
information and retrieving relevant documents that satisfy the
inquirers needs. Search engines provide inquirers irrelevant
documents among search results, since the search is text-based rather
than semantic-based. Information retrieval research area has
presented a number of approaches and methodologies such as
profiling, feedback, query modification, human-computer interaction,
etc for improving search results. Moreover, information retrieval has
employed artificial intelligence techniques and strategies such as
machine learning heuristics, tuning mechanisms, user and system
vocabularies, logical theory, etc for capturing user's preferences and
using them for guiding the search based on the semantic analysis
rather than syntactic analysis. Although a valuable improvement has
been recorded on search results, the survey has shown that still
search engines users are not really satisfied with their search results.
Using ontologies for semantic-based searching is likely the key
solution. Adopting profiling approach and using ontology base
characteristics, this work proposes a strategy for finding the exact
meaning of the query terms in order to retrieve relevant information
according to user needs. The evaluation of conducted experiments
has shown the effectiveness of the suggested methodology and
conclusion is presented.
Abstract: In this paper, an intelligent algorithm for optimal
document archiving is presented. It is kown that electronic archives
are very important for information system management. Minimizing
the size of the stored data in electronic archive is a main issue to
reduce the physical storage area. Here, the effect of different types of
Arabic fonts on electronic archives size is discussed. Simulation
results show that PDF is the best file format for storage of the Arabic
documents in electronic archive. Furthermore, fast information
detection in a given PDF file is introduced. Such approach uses fast
neural networks (FNNs) implemented in the frequency domain. The
operation of these networks relies on performing cross correlation in
the frequency domain rather than spatial one. It is proved
mathematically and practically that the number of computation steps
required for the presented FNNs is less than that needed by
conventional neural networks (CNNs). Simulation results using
MATLAB confirm the theoretical computations.
Abstract: With the advance of multimedia and diagnostic
images technologies, the number of radiographic images is increasing
constantly. The medical field demands sophisticated systems for
search and retrieval of the produced multimedia document. This
paper presents an ongoing research that focuses on the semantic
content of radiographic image documents to facilitate semantic-based
radiographic image indexing and a retrieval system. The proposed
model would divide a radiographic image document, based on its
semantic content, and would be converted into a logical structure or
a semantic structure. The logical structure represents the overall
organization of information. The semantic structure, which is bound
to logical structure, is composed of semantic objects with
interrelationships in the various spaces in the radiographic image.
Abstract: Countries in recession, among them Croatia, have
lower tax revenues as a result of unfavorable economic situation,
which is decrease of the economic activities and unemployment. The
global tax base has decreased. In order to create larger state revenues,
states use the institute of tax authorities. By controlling transfer
pricing in the international companies and using certain techniques,
tax authorities can create greater tax obligations for the companies in
a short period of time.
Abstract: XML is a markup language which is becoming the
standard format for information representation and data exchange. A
major purpose of XML is the explicit representation of the logical
structure of a document. Much research has been performed to
exploit logical structure of documents in information retrieval in
order to precisely extract user information need from large
collections of XML documents. In this paper, we describe an XML
information retrieval weighting scheme that tries to find the most
relevant elements in XML documents in response to a user query.
We present this weighting model for information retrieval systems
that utilize plausible inferences to infer the relevance of elements in
XML documents. We also add to this model the Dempster-Shafer
theory of evidence to express the uncertainty in plausible inferences
and Dempster-Shafer rule of combination to combine evidences
derived from different inferences.
Abstract: Compression algorithms reduce the redundancy in
data representation to decrease the storage required for that data.
Lossless compression researchers have developed highly
sophisticated approaches, such as Huffman encoding, arithmetic
encoding, the Lempel-Ziv (LZ) family, Dynamic Markov
Compression (DMC), Prediction by Partial Matching (PPM), and
Burrows-Wheeler Transform (BWT) based algorithms.
Decompression is also required to retrieve the original data by
lossless means. A compression scheme for text files coupled with
the principle of dynamic decompression, which decompresses only
the section of the compressed text file required by the user instead of
decompressing the entire text file. Dynamic decompressed files offer
better disk space utilization due to higher compression ratios
compared to most of the currently available text file formats.