Optimal Classifying and Extracting Fuzzy Relationship from Query Using Text Mining Techniques

Text mining techniques are generally applied for classifying the text, finding fuzzy relations and structures in data sets. This research provides plenty text mining capabilities. One common application is text classification and event extraction, which encompass deducing specific knowledge concerning incidents referred to in texts. The main contribution of this paper is the clarification of a concept graph generation mechanism, which is based on a text classification and optimal fuzzy relationship extraction. Furthermore, the work presented in this paper explains the application of fuzzy relationship extraction and branch and bound (BB) method to simplify the texts.

A Tree Based Association Rule Approach for XML Data with Semantic Integration

The use of eXtensible Markup Language (XML) in web, business and scientific databases lead to the development of methods, techniques and systems to manage and analyze XML data. Semi-structured documents suffer due to its heterogeneity and dimensionality. XML structure and content mining represent convergence for research in semi-structured data and text mining. As the information available on the internet grows drastically, extracting knowledge from XML documents becomes a harder task. Certainly, documents are often so large that the data set returned as answer to a query may also be very big to convey the required information. To improve the query answering, a Semantic Tree Based Association Rule (STAR) mining method is proposed. This method provides intentional information by considering the structure, content and the semantics of the content. The method is applied on Reuter’s dataset and the results show that the proposed method outperforms well.

A Temporal QoS Ontology for ERTMS/ETCS

Ontologies offer a means for representing and sharing information in many domains, particularly in complex domains. For example, it can be used for representing and sharing information of System Requirement Specification (SRS) of complex systems like the SRS of ERTMS/ETCS written in natural language. Since this system is a real-time and critical system, generic ontologies, such as OWL and generic ERTMS ontologies provide minimal support for modeling temporal information omnipresent in these SRS documents. To support the modeling of temporal information, one of the challenges is to enable representation of dynamic features evolving in time within a generic ontology with a minimal redesign of it. The separation of temporal information from other information can help to predict system runtime operation and to properly design and implement them. In addition, it is helpful to provide a reasoning and querying techniques to reason and query temporal information represented in the ontology in order to detect potential temporal inconsistencies. To address this challenge, we propose a lightweight 3-layer temporal Quality of Service (QoS) ontology for representing, reasoning and querying over temporal and non-temporal information in a complex domain ontology. Representing QoS entities in separated layers can clarify the distinction between the non QoS entities and the QoS entities in an ontology. The upper generic layer of the proposed ontology provides an intuitive knowledge of domain components, specially ERTMS/ETCS components. The separation of the intermediate QoS layer from the lower QoS layer allows us to focus on specific QoS Characteristics, such as temporal or integrity characteristics. In this paper, we focus on temporal information that can be used to predict system runtime operation. To evaluate our approach, an example of the proposed domain ontology for handover operation, as well as a reasoning rule over temporal relations in this domain-specific ontology, are presented.

SQL Generator Based On MVC Pattern

Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.

Comparative Analysis of Different Page Ranking Algorithms

Search engine plays an important role in internet, to retrieve the relevant documents among the huge number of web pages. However, it retrieves more number of documents, which are all relevant to your search topics. To retrieve the most meaningful documents related to search topics, ranking algorithm is used in information retrieval technique. One of the issues in data miming is ranking the retrieved document. In information retrieval the ranking is one of the practical problems. This paper includes various Page Ranking algorithms, page segmentation algorithms and compares those algorithms used for Information Retrieval. Diverse Page Rank based algorithms like Page Rank (PR), Weighted Page Rank (WPR), Weight Page Content Rank (WPCR), Hyperlink Induced Topic Selection (HITS), Distance Rank, Eigen Rumor, Distance Rank Time Rank, Tag Rank, Relational Based Page Rank and Query Dependent Ranking algorithms are discussed and compared.

Spatial Data Mining by Decision Trees

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

On the Interactive Search with Web Documents

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-ofthe- art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents.

Image Retrieval Using Fused Features

The system is designed to show images which are related to the query image. Extracting color, texture, and shape features from an image plays a vital role in content-based image retrieval (CBIR). Initially RGB image is converted into HSV color space due to its perceptual uniformity. From the HSV image, Color features are extracted using block color histogram, texture features using Haar transform and shape feature using Fuzzy C-means Algorithm. Then, the characteristics of the global and local color histogram, texture features through co-occurrence matrix and Haar wavelet transform and shape are compared and analyzed for CBIR. Finally, the best method of each feature is fused during similarity measure to improve image retrieval effectiveness and accuracy.

Selecting the Best Sub-Region Indexing the Images in the Case of Weak Segmentation Based On Local Color Histograms

Color Histogram is considered as the oldest method used by CBIR systems for indexing images. In turn, the global histograms do not include the spatial information; this is why the other techniques coming later have attempted to encounter this limitation by involving the segmentation task as a preprocessing step. The weak segmentation is employed by the local histograms while other methods as CCV (Color Coherent Vector) are based on strong segmentation. The indexation based on local histograms consists of splitting the image into N overlapping blocks or sub-regions, and then the histogram of each block is computed. The dissimilarity between two images is reduced, as consequence, to compute the distance between the N local histograms of the both images resulting then in N*N values; generally, the lowest value is taken into account to rank images, that means that the lowest value is that which helps to designate which sub-region utilized to index images of the collection being asked. In this paper, we make under light the local histogram indexation method in the hope to compare the results obtained against those given by the global histogram. We address also another noteworthy issue when Relying on local histograms namely which value, among N*N values, to trust on when comparing images, in other words, which sub-region among the N*N sub-regions on which we base to index images. Based on the results achieved here, it seems that relying on the local histograms, which needs to pose an extra overhead on the system by involving another preprocessing step naming segmentation, does not necessary mean that it produces better results. In addition to that, we have proposed here some ideas to select the local histogram on which we rely on to encode the image rather than relying on the local histogram having lowest distance with the query histograms.

Categorizing Search Result Records Using Word Sense Disambiguation

Web search engines are designed to retrieve and extract the information in the web databases and to return dynamic web pages. The Semantic Web is an extension of the current web in which it includes semantic content in web pages. The main goal of semantic web is to promote the quality of the current web by changing its contents into machine understandable form. Therefore, the milestone of semantic web is to have semantic level information in the web. Nowadays, people use different keyword- based search engines to find the relevant information they need from the web. But many of the words are polysemous. When these words are used to query a search engine, it displays the Search Result Records (SRRs) with different meanings. The SRRs with similar meanings are grouped together based on Word Sense Disambiguation (WSD). In addition to that semantic annotation is also performed to improve the efficiency of search result records. Semantic Annotation is the process of adding the semantic metadata to web resources. Thus the grouped SRRs are annotated and generate a summary which describes the information in SRRs. But the automatic semantic annotation is a significant challenge in the semantic web. Here ontology and knowledge based representation are used to annotate the web pages.

A Hybrid Nature Inspired Algorithm for Generating Optimal Query Plan

The emergence of the Semantic Web technology increases day by day due to the rapid growth of multiple web pages. Many standard formats are available to store the semantic web data. The most popular format is the Resource Description Framework (RDF). Querying large RDF graphs becomes a tedious procedure with a vast increase in the amount of data. The problem of query optimization becomes an issue in querying large RDF graphs. Choosing the best query plan reduces the amount of query execution time. To address this problem, nature inspired algorithms can be used as an alternative to the traditional query optimization techniques. In this research, the optimal query plan is generated by the proposed SAPSO algorithm which is a hybrid of Simulated Annealing (SA) and Particle Swarm Optimization (PSO) algorithms. The proposed SAPSO algorithm has the ability to find the local optimistic result and it avoids the problem of local minimum. Experiments were performed on different datasets by changing the number of predicates and the amount of data. The proposed algorithm gives improved results compared to existing algorithms in terms of query execution time.

Searching k-Nearest Neighbors to be Appropriate under Gamming Environments

In general, algorithms to find continuous k-nearest neighbors have been researched on the location based services, monitoring periodically the moving objects such as vehicles and mobile phone. Those researches assume the environment that the number of query points is much less than that of moving objects and the query points are not moved but fixed. In gaming environments, this problem is when computing the next movement considering the neighbors such as flocking, crowd and robot simulations. In this case, every moving object becomes a query point so that the number of query point is same to that of moving objects and the query points are also moving. In this paper, we analyze the performance of the existing algorithms focused on location based services how they operate under gaming environments.

Enhance Security in XML Databases: XLog File for Severity-Aware Trust-Based Access Control

The topic of enhancing security in XML databases is important as it includes protecting sensitive data and providing a secure environment to users. In order to improve security and provide dynamic access control for XML databases, we presented XLog file to calculate user trust values by recording users’ bad transaction, errors and query severities. Severity-aware trust-based access control for XML databases manages the access policy depending on users' trust values and prevents unauthorized processes, malicious transactions and insider threats. Privileges are automatically modified and adjusted over time depending on user behaviour and query severity. Logging in database is an important process and is used for recovery and security purposes. In this paper, the Xlog file is presented as a dynamic and temporary log file for XML databases to enhance the level of security.

Distributed GIS Based Decision Support System for Efficiency Evaluation of Education System: A Case Study of Primary School Education System of Bundelkhand Zone, Uttar Pradesh, India

Decision Support System (DSS), a query-based system meant to help decision makers to use a variety of information for decision making, plays a very vital role in sustainable growth of any country. For this very purpose it is essential to analyze the educational system because education is the only way through which people can be made aware as to how to sustain our planet. The purpose of this paper is to prepare a decision support system for efficiency evaluation of education system with the help of Distributed Geographical Information System.

A Robust Method for Finding Nearest-Neighbor using Hexagon Cells

In pattern clustering, nearest neighborhood point computation is a challenging issue for many applications in the area of research such as Remote Sensing, Computer Vision, Pattern Recognition and Statistical Imaging. Nearest neighborhood computation is an essential computation for providing sufficient classification among the volume of pixels (voxels) in order to localize the active-region-of-interests (AROI). Furthermore, it is needed to compute spatial metric relationships of diverse area of imaging based on the applications of pattern recognition. In this paper, we propose a new methodology for finding the nearest neighbor point, depending on making a virtually grid of a hexagon cells, then locate every point beneath them. An algorithm is suggested for minimizing the computation and increasing the turnaround time of the process. The nearest neighbor query points Φ are fetched by seeking fashion of hexagon holistic. Seeking will be repeated until an AROI Φ is to be expected. If any point Υ is located then searching starts in the nearest hexagons in a circular way. The First hexagon is considered be level 0 (L0) and the surrounded hexagons is level 1 (L1). If Υ is located in L1, then search starts in the next level (L2) to ensure that Υ is the nearest neighbor for Φ. Based on the result and experimental results, we found that the proposed method has an advantage over the traditional methods in terms of minimizing the time complexity required for searching the neighbors, in turn, efficiency of classification will be improved sufficiently.

A Review on Important Aspects of Information Retrieval

Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.

Enhancing Privacy-Preserving Cloud Database Querying by Preventing Brute Force Attacks

Considering the complexities involved in Cloud computing, there are still plenty of issues that affect the privacy of data in cloud environment. Unless these problems get solved, we think that the problem of preserving privacy in cloud databases is still open. In tokenization and homomorphic cryptography based solutions for privacy preserving cloud database querying, there is possibility that by colluding with service provider adversary may run brute force attacks that will reveal the attribute values. In this paper we propose a solution by defining the variant of K –means clustering algorithm that effectively detects such brute force attacks and enhances privacy of cloud database querying by preventing this attacks.

Novel Nanomagnetic Beads Based - Latex Agglutination Assay for Rapid Diagnosis of Human Schistosomiasis Haematobium

The objective of the present study was to evaluate the novel nanomagnetic beads based–latex agglutination assay (NMB-LAT) as a simple test for diagnosis of S. haematobium as well as standardize the novel nanomagnetic beads based –ELISA (NMB-ELISA). According to urine examination this study included 85 S. haematobium infected patients, 30 other parasites infected patients and 25 negative control samples. The sensitivity of novel NMB-LAT was 82.4% versus 96.5% and 88.2% for NMB-ELISA and currently used sandwich ELISA respectively. The specificity of NMB-LAT was 83.6% versus 96.3% and 87.3% for NMB-ELISA and currently used sandwich ELISA respectively. In conclusion, the novel NMB-ELISA is a valuable applicable diagnostic technique for diagnosis of human schistosomiasis haematobium. The novel NMB-ELISA assay is a suitable applicable diagnostic method in field survey especially when followed by ELISA as a confirmatory test in query false negative results. Trials are required to increase the sensitivity and specificity of NMB-ELISA assay.

Query Reformulation Guided by External Resource for Information Retrieval

Reformulating the user query is a technique that aims to improve the performance of an Information Retrieval System (IRS) in terms of precision and recall. This paper tries to evaluate the technique of query reformulation guided by an external resource for Arabic texts. To do this, various precision and recall measures were conducted and two corpora with different external resources like Arabic WordNet (AWN) and the Arabic Dictionary (thesaurus) of Meaning (ADM) were used. Examination of the obtained results will allow us to measure the real contribution of this reformulation technique in improving the IRS performance.

The Digital Filing Cabinet–A GIS Based Management Solution Tool for the Land Surveyor and Engineer

This paper explains how the New Jersey Institute of Technology surveying student team members designed and created an interactive GIS map, the purpose of which is to be useful to the land surveyor and engineer for project management. This was achieved by building a research and storage database that can be easily integrated into any land surveyor’s current operations through the use of ArcGIS 10, Arc Catalog, and AutoCAD. This GIS database allows for visual representation and information querying for multiple job sites, and simple access to uploaded data, which is geospatially referenced to each individual job site or project. It can also be utilized by engineers to determine design criteria, or to store important files. This cost-effective approach to a surveying map not only saves time, but saves physical storage space and paper resources.