A Formatting Method for Transforming XML Data into HTML

In this paper, we propose a fixed formatting method of PPX(Pretty Printer for XML). PPX is a query language for XML database which has extensive formatting capability that produces HTML as the result of a query. The fixed formatting method is to completely specify the combination of variables and layout specification operators within the layout expression of the GENERATE clause of PPX. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing the same tasks.

Sulfamonomethoxine-Induced Urinary Calculiin Pigs

The authors report a case of swine urolithiasis caused by improper administration of sulfamonomethoxine and which was diagnosed by examination of urinary sediments and analyzing the composition of the uroliths. The chemical composition of urinary calculi obtained from affected pigs with urolithiasis was further confimed as sulfamonomethoxine by fourier transform infrared (FTIR). It is suggested that appearance of typical fanlike or wheat bunchy crystals in urinary sediments under observation of lightmicroscope and determination by FTIR for the crystals are helpful in diagnosing sulfa calculi causced swine urolithiasis.

A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.

Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation

The internet has become an attractive avenue for global e-business, e-learning, knowledge sharing, etc. Due to continuous increase in the volume of web content, it is not practically possible for a user to extract information by browsing and integrating data from a huge amount of web sources retrieved by the existing search engines. The semantic web technology enables advancement in information extraction by providing a suite of tools to integrate data from different sources. To take full advantage of semantic web, it is necessary to annotate existing web pages into semantic web pages. This research develops a tool, named OWIE (Ontology-based Web Information Extraction), for semantic web annotation using domain specific ontologies. The tool automatically extracts information from html pages with the help of pre-defined ontologies and gives them semantic representation. Two case studies have been conducted to analyze the accuracy of OWIE.

A Hybrid Ontology Based Approach for Ranking Documents

Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.

Web Application Security, Attacks and Mitigation

Today’s technology is heavily dependent on web applications. Web applications are being accepted by users at a very rapid pace. These have made our work efficient. These include webmail, online retail sale, online gaming, wikis, departure and arrival of trains and flights and list is very long. These are developed in different languages like PHP, Python, C#, ASP.NET and many more by using scripts such as HTML and JavaScript. Attackers develop tools and techniques to exploit web applications and legitimate websites. This has led to rise of web application security; which can be broadly classified into Declarative Security and Program Security. The most common attacks on the applications are by SQL Injection and XSS which give access to unauthorized users who totally damage or destroy the system. This paper presents a detailed literature description and analysis on Web Application Security, examples of attacks and steps to mitigate the vulnerabilities.

The Usefulness of Logical Structure in Flexible Document Categorization

This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, ...). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach flexibility is carried out with rule weight association representing your importance in the discrimination between possible categories. This weight is dynamically modified at each new document categorization. The experimentation of the proposed approach provides satisfactory results.

An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

A study of Cancer-related MicroRNAs through Expression Data and Literature Search

MicroRNAs (miRNAs) are a class of non-coding RNAs that hybridize to mRNAs and induce either translation repression or mRNA cleavage. Recently, it has been reported that miRNAs could possibly play an important role in human diseases. By integrating miRNA target genes, cancer genes, miRNA and mRNA expression profiles information, a database is developed to link miRNAs to cancer target genes. The database provides experimentally verified human miRNA target genes information, including oncogenes and tumor suppressor genes. In addition, fragile sites information for miRNAs, and the strength of the correlation of miRNA and its target mRNA expression level for nine tissue types are computed, which serve as an indicator for suggesting miRNAs could play a role in human cancer. The database is freely accessible at http://ppi.bioinfo.asia.edu.tw/mirna_target/index.html.

Compression of Semistructured Documents

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text of the documents as a part of the index. Such a requirement leads us to pick up an appropriate compression algorithm which would reduce the space demand. One of the solutions could be to use common compression methods, for instance gzip or bzip2, but it might be preferable if we develop a new method which would take advantage of the document structure, or rather, the textual character of the documents. There already exist a special compression text algorithms and methods for a compression of XML documents. The aim of this paper is an integration of the two approaches to achieve an optimal level of the compression ratio

Improving Performance of World Wide Web by Adaptive Web Traffic Reduction

The ever increasing use of World Wide Web in the existing network, results in poor performance. Several techniques have been developed for reducing web traffic by compressing the size of the file, saving the web pages at the client side, changing the burst nature of traffic into constant rate etc. No single method was adequate enough to access the document instantly through the Internet. In this paper, adaptive hybrid algorithms are developed for reducing web traffic. Intelligent agents are used for monitoring the web traffic. Depending upon the bandwidth usage, user-s preferences, server and browser capabilities, intelligent agents use the best techniques to achieve maximum traffic reduction. Web caching, compression, filtering, optimization of HTML tags, and traffic dispersion are incorporated into this adaptive selection. Using this new hybrid technique, latency is reduced to 20 – 60 % and cache hit ratio is increased 40 – 82 %.

A Proposal of an Automatic Formatting Method for Transforming XML Data

PPX(Pretty Printer for XML) is a query language that offers a concise description method of formatting the XML data into HTML. In this paper, we propose a simple specification of formatting method that is a combination description of automatic layout operators and variables in the layout expression of the GENERATE clause of PPX. This method can automatically format irregular XML data included in a part of XML with layout decision rule that is referred to DTD. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing same tasks.

Richtmyer-Meshkov Instability and Gas-Particle Interaction of Contoured Shock-Tube Flows: A Numerical Study

In this paper, computational fluid dynamics (CFD) is utilized to characterize a prototype biolistic delivery system, the biomedical device based on the contoured-shock-tube design (CST), with the aim at investigating shocks induced flow instabilities within the contoured shock tube. The shock/interface interactions, the growth of perturbation at an interface between two fluids of different density are interrogated. The key features of the gas dynamics and gas-particle interaction are discussed