Abstract: Although most of the existing skyline queries algorithms focused basically on querying static points through static databases; with the expanding number of sensors, wireless communications and mobile applications, the demand for continuous skyline queries has increased. Unlike traditional skyline queries which only consider static attributes, continuous skyline queries include dynamic attributes, as well as the static ones. However, as skyline queries computation is based on checking the domination of skyline points over all dimensions, considering both the static and dynamic attributes without separation is required. In this paper, we present an efficient algorithm for computing continuous skyline queries without discriminating between static and dynamic attributes. Our algorithm in brief proceeds as follows: First, it excludes the points which will not be in the initial skyline result; this pruning phase reduces the required number of comparisons. Second, the association between the spatial positions of data points is examined; this phase gives an idea of where changes in the result might occur and consequently enables us to efficiently update the skyline result (continuous update) rather than computing the skyline from scratch. Finally, experimental evaluation is provided which demonstrates the accuracy, performance and efficiency of our algorithm over other existing approaches.
Abstract: Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
Abstract: With the emergence and development of Information
and Communications Technologies (ICTs), Higher Education is
experiencing rapid changes, not only in its teaching strategies but
also in student’s learning skills. However, we have noticed that
students often have difficulty when seeking innovative, useful, and
interesting learning resources for their work. This is due to the
lack of supervision in the selection of good query tools. This paper
presents AINA, an Information Retrieval (IR) computer system aimed
at providing motivating and stimulating content to both students
and teachers working on different areas and at different educational
levels. In particular, our proposal consists of an open virtual resource
environment oriented to the vast universe of Disney comics and
cartoons. Our test suite includes Disney’s long and shorts films,
and we have performed some activities based on the Just In Time
Teaching (JiTT) methodology. More specifically, it has been tested
by groups of university and secondary school students.
Abstract: Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.
Abstract: When using Information Retrieval Systems (IRS), users often present search queries made of ad-hoc keywords. It is then up to the IRS to obtain a precise representation of the user’s information need and the context of the information. This paper investigates optimization of IRS to individual information needs in order of relevance. The study addressed development of algorithms that optimize the ranking of documents retrieved from IRS. This study discusses and describes a Document Ranking Optimization (DROPT) algorithm for information retrieval (IR) in an Internet-based or designated databases environment. Conversely, as the volume of information available online and in designated databases is growing continuously, ranking algorithms can play a major role in the context of search results. In this paper, a DROPT technique for documents retrieved from a corpus is developed with respect to document index keywords and the query vectors. This is based on calculating the weight (
Abstract: Databases comprise the foundation of most software systems. System developers inevitably write code to query these databases. The de facto language for querying is SQL and this, consequently, is the default language taught by higher education institutions. There is evidence that learners find it hard to master SQL, harder than mastering other programming languages such as Java. Educators do not agree about explanations for this seeming anomaly. Further investigation may well reveal the reasons. In this paper, we report on our investigations into how novices learn SQL, the actual problems they experience when writing SQL, as well as the differences between expert and novice SQL query writers. We conclude by presenting a model of SQL learning that should inform the instructional material design process better to support the SQL learning process.
Abstract: The Information Retrieval community is facing the problem of effective representation of Web search results. When we organize web search results into clusters it becomes easy to the users to quickly browse through search results. The traditional search engines organize search results into clusters for ambiguous queries, representing each cluster for each meaning of the query. The clusters are obtained according to the topical similarity of the retrieved search results, but it is possible for results to be totally dissimilar and still correspond to the same meaning of the query. People search is also one of the most common tasks on the Web nowadays, but when a particular person’s name is queried the search engines return web pages which are related to different persons who have the same queried name. By placing the burden on the user of disambiguating and collecting pages relevant to a particular person, in this paper, we have developed an approach that clusters web pages based on the association of the web pages to the different people and clusters that are based on generic entity search.
Abstract: Natural Language Interfaces typically support a restricted language and also have scopes and limitations that naïve users are unaware of, resulting in errors when the users attempt to retrieve information from ontologies. To overcome this challenge, an auto-suggest feature is introduced into the querying process where users are guided through the querying process using interactive query construction system. Guiding users to formulate their queries, while providing them with an unconstrained (or almost unconstrained) way to query the ontology results in better interpretation of the query and ultimately lead to an effective search. The approach described in this paper is unobtrusive and subtly guides the users, so that they have a choice of either selecting from the suggestion list or typing in full. The user is not coerced into accepting system suggestions and can express himself using fragments or full sentences.
Abstract: Nowadays, the dissemination of information touches the distributed world, where selecting the relevant servers to a user request is an important problem in distributed information retrieval. During the last decade, several research studies on this issue have been launched to find optimal solutions and many approaches of collection selection have been proposed. In this paper, we propose a new collection selection approach that takes into consideration the number of documents in a collection that contains terms of the query and the weights of those terms in these documents. We tested our method and our studies show that this technique can compete with other state-of-the-art algorithms that we choose to test the performance of our approach.
Abstract: This paper presents an approach for the model-driven
generating of Rich Internet Application (RIA) focusing on the
graphical aspect. We used well known Model-Driven Engineering
(MDE) frameworks and technologies, such as Eclipse Modeling
Framework (EMF), Graphical Modeling Framework (GMF), Query
View Transformation (QVTo) and Acceleo to enable the design and
the code automatic generation of the RIA. During the development of
the approach, we focused on the graphical aspect of the application
in terms of interfaces while opting for the Model View Presenter
pattern that is designed for graphics interfaces. The paper describes
the process followed to define the approach, the supporting tool and
presents the results from a case study.
Abstract: Nowadays, ontologies are used for achieving a
common understanding within a user community and for sharing
domain knowledge. However, the de-centralized nature of the web
makes indeed inevitable that small communities will use their own
ontologies to describe their data and to index their own resources.
Certainly, accessing to resources from various ontologies created
independently is an important challenge for answering end user
queries. Ontology mapping is thus required for combining ontologies.
However, mapping complete ontologies at run time is a
computationally expensive task. This paper proposes a system in
which mappings between concepts may be generated dynamically as
the concepts are encountered during user queries. In this way, the
interaction itself defines the context in which small and relevant
portions of ontologies are mapped. We illustrate application of the
proposed system in the context of Technology Enhanced Learning
(TEL) where learners need to access to learning resources covering
specific concepts.
Abstract: Seeking and sharing knowledge on online forums
have made them popular in recent years. Although online forums are
valuable sources of information, due to variety of sources of
messages, retrieving reliable threads with high quality content is an
issue. Majority of the existing information retrieval systems ignore
the quality of retrieved documents, particularly, in the field of thread
retrieval. In this research, we present an approach that employs
various quality features in order to investigate the quality of retrieved
threads. Different aspects of content quality, including completeness,
comprehensiveness, and politeness, are assessed using these features,
which lead to finding not only textual, but also conceptual relevant
threads for a user query within a forum. To analyse the influence of
the features, we used an adopted version of voting model thread
search as a retrieval system. We equipped it with each feature solely
and also various combinations of features in turn during multiple
runs. The results show that incorporating the quality features
enhances the effectiveness of the utilised retrieval system
significantly.
Abstract: This article discusses the passage of RDB to XML
documents (schema and data) based on metadata and semantic
enrichment, which makes the RDB under flattened shape and is
enriched by the object concept. The integration and exploitation of
the object concept in the XML uses a syntax allowing for the
verification of the conformity of the document XML during the
creation. The information extracted from the RDB is therefore
analyzed and filtered in order to adjust according to the structure of
the XML files and the associated object model. Those implemented
in the XML document through a SQL query are built dynamically. A
prototype was implemented to realize automatic migration, and so
proves the effectiveness of this particular approach.
Abstract: The aim of this paper is to propose a general
framework for storing, analyzing, and extracting knowledge from
two-dimensional echocardiographic images, color Doppler images,
non-medical images, and general data sets. A number of high
performance data mining algorithms have been used to carry out this
task. Our framework encompasses four layers namely physical
storage, object identification, knowledge discovery, user level.
Techniques such as active contour model to identify the cardiac
chambers, pixel classification to segment the color Doppler echo
image, universal model for image retrieval, Bayesian method for
classification, parallel algorithms for image segmentation, etc., were
employed. Using the feature vector database that have been
efficiently constructed, one can perform various data mining tasks
like clustering, classification, etc. with efficient algorithms along
with image mining given a query image. All these facilities are
included in the framework that is supported by state-of-the-art user
interface (UI). The algorithms were tested with actual patient data
and Coral image database and the results show that their performance
is better than the results reported already.
Abstract: Predicting earnings management is vital for the capital
market participants, financial analysts and managers. The aim of this
research is attempting to respond to this query: Is there a significant
difference between the regression model and neural networks’
models in predicting earnings management, and which one leads to a
superior prediction of it? In approaching this question, a Linear
Regression (LR) model was compared with two neural networks
including Multi-Layer Perceptron (MLP), and Generalized
Regression Neural Network (GRNN). The population of this study
includes 94 listed companies in Tehran Stock Exchange (TSE)
market from 2003 to 2011. After the results of all models were
acquired, ANOVA was exerted to test the hypotheses. In general, the
summary of statistical results showed that the precision of GRNN did
not exhibit a significant difference in comparison with MLP. In
addition, the mean square error of the MLP and GRNN showed a
significant difference with the multi variable LR model. These
findings support the notion of nonlinear behavior of the earnings
management. Therefore, it is more appropriate for capital market
participants to analyze earnings management based upon neural
networks techniques, and not to adopt linear regression models.
Abstract: Web mining is to discover and extract useful
Information. Different users may have different search goals when
they search by giving queries and submitting it to a search engine.
The inference and analysis of user search goals can be very useful for
providing an experience result for a user search query. In this project,
we propose a novel approach to infer user search goals by analyzing
search web logs. First, we propose a novel approach to infer user
search goals by analyzing search engine query logs, the feedback
sessions are constructed from user click-through logs and it
efficiently reflect the information needed for users. Second we
propose a preprocessing technique to clean the unnecessary data’s
from web log file (feedback session). Third we propose a technique
to generate pseudo-documents to representation of feedback sessions
for clustering. Finally we implement k-medoids clustering algorithm
to discover different user search goals and to provide a more optimal
result for a search query based on feedback sessions for the user.
Abstract: The enormous amount of information stored on the
web increases from one day to the next, exposing the web currently
faced with the inevitable difficulties of research pertinent information
that users really want. The problem today is not limited to expanding
the size of the information highways, but to design a system for
intelligent search. The vast majority of this information is stored in
relational databases, which in turn represent a backend for managing
RDF data of the semantic web. This problem has motivated us to
write this paper in order to establish an effective approach to support
semantic transformation algorithm for SPARQL queries to SQL
queries, more precisely SPARQL SELECT queries; by adopting this
method, the relational database can be questioned easily with
SPARQL queries maintaining the same performance.
Abstract: Thousands of organisations store important and
confidential information related to them, their customers, and their
business partners in databases all across the world. The stored data
ranges from less sensitive (e.g. first name, last name, date of birth) to
more sensitive data (e.g. password, pin code, and credit card
information). Losing data, disclosing confidential information or
even changing the value of data are the severe damages that
Structured Query Language injection (SQLi) attack can cause on a
given database. It is a code injection technique where malicious SQL
statements are inserted into a given SQL database by simply using a
web browser. In this paper, we propose an effective pattern
recognition neural network model for detection and classification of
SQLi attacks. The proposed model is built from three main elements
of: a Uniform Resource Locator (URL) generator in order to generate
thousands of malicious and benign URLs, a URL classifier in order
to: 1) classify each generated URL to either a benign URL or a
malicious URL and 2) classify the malicious URLs into different
SQLi attack categories, and a NN model in order to: 1) detect either a
given URL is a malicious URL or a benign URL and 2) identify the
type of SQLi attack for each malicious URL. The model is first
trained and then evaluated by employing thousands of benign and
malicious URLs. The results of the experiments are presented in
order to demonstrate the effectiveness of the proposed approach.
Abstract: This paper describes the tradeoffs and the design from
scratch of a self-contained, easy-to-use health dashboard software
system that provides customizable data tracking for patients in smart
homes. The system is made up of different software modules and
comprises a front-end and a back-end component. Built with HTML,
CSS, and JavaScript, the front-end allows adding users, logging into
the system, selecting metrics, and specifying health goals. The backend
consists of a NoSQL Mongo database, a Python script, and a
SimpleHTTPServer written in Python. The database stores user
profiles and health data in JSON format. The Python script makes use
of the PyMongo driver library to query the database and displays
formatted data as a daily snapshot of user health metrics against
target goals. Any number of standard and custom metrics can be
added to the system, and corresponding health data can be fed
automatically, via sensor APIs or manually, as text or picture data
files. A real-time METAR request API permits correlating weather
data with patient health, and an advanced query system is
implemented to allow trend analysis of selected health metrics over
custom time intervals. Available on the GitHub repository system,
the project is free to use for academic purposes of learning and
experimenting, or practical purposes by building on it.
Abstract: Image search engines rely on the surrounding textual
keywords for the retrieval of images. It is a tedious work for the
search engines like Google and Bing to interpret the user’s search
intention and to provide the desired results. The recent researches
also state that the Google image search engines do not work well on
all the images. Consequently, this leads to the emergence of efficient
image retrieval technique, which interprets the user’s search intention
and shows the desired results. In order to accomplish this task, an
efficient image re-ranking framework is required. Sequentially, to
provide best image retrieval, the new image re-ranking framework is
experimented in this paper. The implemented new image re-ranking
framework provides best image retrieval from the image dataset by
making use of re-ranking of retrieved images that is based on the
user’s desired images. This is experimented in two sections. One is
offline section and other is online section. In offline section, the reranking
framework studies differently (reference classes or Semantic
Spaces) for diverse user query keywords. The semantic signatures get
generated by combining the textual and visual features of the images.
In the online section, images are re-ranked by comparing the
semantic signatures that are obtained from the reference classes with
the user specified image query keywords. This re-ranking
methodology will increases the retrieval image efficiency and the
result will be effective to the user.