Abstract: Text mining techniques are generally applied for
classifying the text, finding fuzzy relations and structures in data
sets. This research provides plenty text mining capabilities. One
common application is text classification and event extraction,
which encompass deducing specific knowledge concerning incidents
referred to in texts. The main contribution of this paper is the
clarification of a concept graph generation mechanism, which is based
on a text classification and optimal fuzzy relationship extraction.
Furthermore, the work presented in this paper explains the application
of fuzzy relationship extraction and branch and bound (BB) method
to simplify the texts.
Abstract: The use of eXtensible Markup Language (XML) in
web, business and scientific databases lead to the development of
methods, techniques and systems to manage and analyze XML data.
Semi-structured documents suffer due to its heterogeneity and
dimensionality. XML structure and content mining represent
convergence for research in semi-structured data and text mining. As
the information available on the internet grows drastically, extracting
knowledge from XML documents becomes a harder task. Certainly,
documents are often so large that the data set returned as answer to a
query may also be very big to convey the required information. To
improve the query answering, a Semantic Tree Based Association
Rule (STAR) mining method is proposed. This method provides
intentional information by considering the structure, content and the
semantics of the content. The method is applied on Reuter’s dataset
and the results show that the proposed method outperforms well.
Abstract: Ontologies offer a means for representing and sharing
information in many domains, particularly in complex domains. For
example, it can be used for representing and sharing information
of System Requirement Specification (SRS) of complex systems
like the SRS of ERTMS/ETCS written in natural language. Since
this system is a real-time and critical system, generic ontologies,
such as OWL and generic ERTMS ontologies provide minimal
support for modeling temporal information omnipresent in these SRS
documents. To support the modeling of temporal information, one
of the challenges is to enable representation of dynamic features
evolving in time within a generic ontology with a minimal redesign
of it. The separation of temporal information from other information
can help to predict system runtime operation and to properly design
and implement them. In addition, it is helpful to provide a reasoning
and querying techniques to reason and query temporal information
represented in the ontology in order to detect potential temporal
inconsistencies. To address this challenge, we propose a lightweight
3-layer temporal Quality of Service (QoS) ontology for representing,
reasoning and querying over temporal and non-temporal information
in a complex domain ontology. Representing QoS entities in separated
layers can clarify the distinction between the non QoS entities
and the QoS entities in an ontology. The upper generic layer of
the proposed ontology provides an intuitive knowledge of domain
components, specially ERTMS/ETCS components. The separation of
the intermediate QoS layer from the lower QoS layer allows us to
focus on specific QoS Characteristics, such as temporal or integrity
characteristics. In this paper, we focus on temporal information that
can be used to predict system runtime operation. To evaluate our
approach, an example of the proposed domain ontology for handover
operation, as well as a reasoning rule over temporal relations in this
domain-specific ontology, are presented.
Abstract: Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.
Abstract: Search engine plays an important role in internet, to
retrieve the relevant documents among the huge number of web
pages. However, it retrieves more number of documents, which are
all relevant to your search topics. To retrieve the most meaningful
documents related to search topics, ranking algorithm is used in
information retrieval technique. One of the issues in data miming is
ranking the retrieved document. In information retrieval the ranking
is one of the practical problems. This paper includes various Page
Ranking algorithms, page segmentation algorithms and compares
those algorithms used for Information Retrieval. Diverse Page Rank
based algorithms like Page Rank (PR), Weighted Page Rank (WPR),
Weight Page Content Rank (WPCR), Hyperlink Induced Topic
Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time
Rank, Tag Rank, Relational Based Page Rank and Query Dependent
Ranking algorithms are discussed and compared.
Abstract: Existing methods of data mining cannot be applied on
spatial data because they require spatial specificity consideration, as
spatial relationships.
This paper focuses on the classification with decision trees, which
are one of the data mining techniques. We propose an extension of
the C4.5 algorithm for spatial data, based on two different approaches
Join materialization and Querying on the fly the different tables.
Similar works have been done on these two main approaches, the
first - Join materialization - favors the processing time in spite of
memory space, whereas the second - Querying on the fly different
tables- promotes memory space despite of the processing time.
The modified C4.5 algorithm requires three entries tables: a target
table, a neighbor table, and a spatial index join that contains the
possible spatial relationship among the objects in the target table and
those in the neighbor table. Thus, the proposed algorithms are applied
to a spatial data pattern in the accidentology domain.
A comparative study of our approach with other works of
classification by spatial decision trees will be detailed.
Abstract: Due to the large amount of information in the World
Wide Web (WWW, web) and the lengthy and usually linearly
ordered result lists of web search engines that do not indicate
semantic relationships between their entries, the search for topically
similar and related documents can become a tedious task. Especially,
the process of formulating queries with proper terms representing
specific information needs requires much effort from the user. This
problem gets even bigger when the user's knowledge on a subject and
its technical terms is not sufficient enough to do so. This article
presents the new and interactive search application DocAnalyser that
addresses this problem by enabling users to find similar and related
web documents based on automatic query formulation and state-ofthe-
art search word extraction. Additionally, this tool can be used to
track topics across semantically connected web documents.
Abstract: The system is designed to show images which are
related to the query image. Extracting color, texture, and shape
features from an image plays a vital role in content-based image
retrieval (CBIR). Initially RGB image is converted into HSV color
space due to its perceptual uniformity. From the HSV image, Color
features are extracted using block color histogram, texture features
using Haar transform and shape feature using Fuzzy C-means
Algorithm. Then, the characteristics of the global and local color
histogram, texture features through co-occurrence matrix and Haar
wavelet transform and shape are compared and analyzed for CBIR.
Finally, the best method of each feature is fused during similarity
measure to improve image retrieval effectiveness and accuracy.
Abstract: Color Histogram is considered as the oldest method
used by CBIR systems for indexing images. In turn, the global
histograms do not include the spatial information; this is why the
other techniques coming later have attempted to encounter this
limitation by involving the segmentation task as a preprocessing step.
The weak segmentation is employed by the local histograms while
other methods as CCV (Color Coherent Vector) are based on strong
segmentation. The indexation based on local histograms consists of
splitting the image into N overlapping blocks or sub-regions, and
then the histogram of each block is computed. The dissimilarity
between two images is reduced, as consequence, to compute the
distance between the N local histograms of the both images resulting
then in N*N values; generally, the lowest value is taken into account
to rank images, that means that the lowest value is that which helps to
designate which sub-region utilized to index images of the collection
being asked. In this paper, we make under light the local histogram
indexation method in the hope to compare the results obtained against
those given by the global histogram. We address also another
noteworthy issue when Relying on local histograms namely which
value, among N*N values, to trust on when comparing images, in
other words, which sub-region among the N*N sub-regions on which
we base to index images. Based on the results achieved here, it seems
that relying on the local histograms, which needs to pose an extra
overhead on the system by involving another preprocessing step
naming segmentation, does not necessary mean that it produces better
results. In addition to that, we have proposed here some ideas to
select the local histogram on which we rely on to encode the image
rather than relying on the local histogram having lowest distance with
the query histograms.
Abstract: Web search engines are designed to retrieve and
extract the information in the web databases and to return dynamic
web pages. The Semantic Web is an extension of the current web in
which it includes semantic content in web pages. The main goal of
semantic web is to promote the quality of the current web by
changing its contents into machine understandable form. Therefore,
the milestone of semantic web is to have semantic level information
in the web. Nowadays, people use different keyword- based search
engines to find the relevant information they need from the web.
But many of the words are polysemous. When these words are
used to query a search engine, it displays the Search Result Records
(SRRs) with different meanings. The SRRs with similar meanings are
grouped together based on Word Sense Disambiguation (WSD). In
addition to that semantic annotation is also performed to improve the
efficiency of search result records. Semantic Annotation is the
process of adding the semantic metadata to web resources. Thus the
grouped SRRs are annotated and generate a summary which
describes the information in SRRs. But the automatic semantic
annotation is a significant challenge in the semantic web. Here
ontology and knowledge based representation are used to annotate
the web pages.
Abstract: The emergence of the Semantic Web technology
increases day by day due to the rapid growth of multiple web pages.
Many standard formats are available to store the semantic web data.
The most popular format is the Resource Description Framework
(RDF). Querying large RDF graphs becomes a tedious procedure
with a vast increase in the amount of data. The problem of query
optimization becomes an issue in querying large RDF graphs.
Choosing the best query plan reduces the amount of query execution
time. To address this problem, nature inspired algorithms can be used
as an alternative to the traditional query optimization techniques. In
this research, the optimal query plan is generated by the proposed
SAPSO algorithm which is a hybrid of Simulated Annealing (SA)
and Particle Swarm Optimization (PSO) algorithms. The proposed
SAPSO algorithm has the ability to find the local optimistic result
and it avoids the problem of local minimum. Experiments were
performed on different datasets by changing the number of predicates
and the amount of data. The proposed algorithm gives improved
results compared to existing algorithms in terms of query execution
time.
Abstract: In general, algorithms to find continuous k-nearest neighbors have been researched on the location based services, monitoring periodically the moving objects such as vehicles and mobile phone. Those researches assume the environment that the number of query points is much less than that of moving objects and the query points are not moved but fixed. In gaming environments, this problem is when computing the next movement considering the neighbors such as flocking, crowd and robot simulations. In this case, every moving object becomes a query point so that the number of query point is same to that of moving objects and the query points are also moving. In this paper, we analyze the performance of the existing algorithms focused on location based services how they operate under gaming environments.
Abstract: The topic of enhancing security in XML databases is important as it includes protecting sensitive data and providing a secure environment to users. In order to improve security and provide dynamic access control for XML databases, we presented XLog file to calculate user trust values by recording users’ bad transaction, errors and query severities. Severity-aware trust-based access control for XML databases manages the access policy depending on users' trust values and prevents unauthorized processes, malicious transactions and insider threats. Privileges are automatically modified and adjusted over time depending on user behaviour and query severity. Logging in database is an important process and is used for recovery and security purposes. In this paper, the Xlog file is presented as a dynamic and temporary log file for XML databases to enhance the level of security.
Abstract: Decision Support System (DSS), a query-based system meant to help decision makers to use a variety of information for decision making, plays a very vital role in sustainable growth of any country. For this very purpose it is essential to analyze the educational system because education is the only way through which people can be made aware as to how to sustain our planet. The purpose of this paper is to prepare a decision support system for efficiency evaluation of education system with the help of Distributed Geographical Information System.
Abstract: In pattern clustering, nearest neighborhood point computation is a challenging issue for many applications in the area of research such as Remote Sensing, Computer Vision, Pattern Recognition and Statistical Imaging. Nearest neighborhood
computation is an essential computation for providing sufficient classification among the volume of pixels (voxels) in order to localize the active-region-of-interests (AROI). Furthermore, it is needed to compute spatial metric relationships of diverse area of imaging based on the applications of pattern recognition. In this paper, we propose a new methodology for finding the nearest neighbor point, depending on making a virtually grid of a hexagon cells, then locate every point beneath them. An algorithm is suggested for minimizing the computation and increasing the turnaround time of the process. The nearest neighbor query points Φ are fetched by seeking fashion of hexagon holistic. Seeking will be repeated until an AROI Φ is to be expected. If any point Υ is located then searching starts in the nearest hexagons in a circular way. The First hexagon is considered be level 0 (L0) and the surrounded hexagons is level 1 (L1). If Υ is located in L1, then search starts in the next level (L2) to ensure that Υ is the nearest neighbor for Φ. Based on the result and experimental results, we found that the proposed method has an advantage over the traditional methods in terms of minimizing the time complexity required for searching the neighbors, in turn, efficiency of classification will be improved sufficiently.
Abstract: Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.
Abstract: Considering the complexities involved in Cloud computing, there are still plenty of issues that affect the privacy of data in cloud environment. Unless these problems get solved, we think that the problem of preserving privacy in cloud databases is still open. In tokenization and homomorphic cryptography based solutions for privacy preserving cloud database querying, there is possibility that by colluding with service provider adversary may run brute force attacks that will reveal the attribute values.
In this paper we propose a solution by defining the variant of K –means clustering algorithm that effectively detects such brute force attacks and enhances privacy of cloud database querying by preventing this attacks.
Abstract: The objective of the present study was to evaluate the novel nanomagnetic beads based–latex agglutination assay (NMB-LAT) as a simple test for diagnosis of S. haematobium as well as standardize the novel nanomagnetic beads based –ELISA (NMB-ELISA). According to urine examination this study included 85 S. haematobium infected patients, 30 other parasites infected patients and 25 negative control samples. The sensitivity of novel NMB-LAT was 82.4% versus 96.5% and 88.2% for NMB-ELISA and currently used sandwich ELISA respectively. The specificity of NMB-LAT was 83.6% versus 96.3% and 87.3% for NMB-ELISA and currently used sandwich ELISA respectively. In conclusion, the novel NMB-ELISA is a valuable applicable diagnostic technique for diagnosis of human schistosomiasis haematobium. The novel NMB-ELISA assay is a suitable applicable diagnostic method in field survey especially when followed by ELISA as a confirmatory test in query false negative results. Trials are required to increase the sensitivity and specificity of NMB-ELISA assay.
Abstract: Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.
Abstract: This paper explains how the New Jersey Institute of Technology surveying student team members designed and created an interactive GIS map, the purpose of which is to be useful to the land surveyor and engineer for project management. This was achieved by building a research and storage database that can be easily integrated into any land surveyor’s current operations through the use of ArcGIS 10, Arc Catalog, and AutoCAD. This GIS database allows for visual representation and information querying for multiple job sites, and simple access to uploaded data, which is geospatially referenced to each individual job site or project. It can also be utilized by engineers to determine design criteria, or to store important files. This cost-effective approach to a surveying map not only saves time, but saves physical storage space and paper resources.