Abstract: In this paper, we propose a fixed formatting method of PPX(Pretty Printer for XML). PPX is a query language for XML database which has extensive formatting capability that produces HTML as the result of a query. The fixed formatting method is to completely specify the combination of variables and layout specification operators within the layout expression of the GENERATE clause of PPX. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing the same tasks.
Abstract: The authors report a case of swine urolithiasis caused
by improper administration of sulfamonomethoxine and which was
diagnosed by examination of urinary sediments and analyzing the
composition of the uroliths. The chemical composition of urinary
calculi obtained from affected pigs with urolithiasis was further
confimed as sulfamonomethoxine by fourier transform infrared
(FTIR). It is suggested that appearance of typical fanlike or wheat
bunchy crystals in urinary sediments under observation of lightmicroscope
and determination by FTIR for the crystals are helpful in
diagnosing sulfa calculi causced swine urolithiasis.
Abstract: Text Mining is an important step of Knowledge
Discovery process. It is used to extract hidden information from notstructured
o semi-structured data. This aspect is fundamental because
much of the Web information is semi-structured due to the nested
structure of HTML code, much of the Web information is linked,
much of the Web information is redundant. Web Text Mining helps
whole knowledge mining process to mining, extraction and
integration of useful data, information and knowledge from Web
page contents.
In this paper, we present a Web Text Mining process able to
discover knowledge in a distributed and heterogeneous multiorganization
environment. The Web Text Mining process is based on
flexible architecture and is implemented by four steps able to
examine web content and to extract useful hidden information
through mining techniques. Our Web Text Mining prototype starts
from the recovery of Web job offers in which, through a Text Mining
process, useful information for fast classification of the same are
drawn out, these information are, essentially, job offer place and
skills.
Abstract: The internet has become an attractive avenue for
global e-business, e-learning, knowledge sharing, etc. Due to
continuous increase in the volume of web content, it is not practically
possible for a user to extract information by browsing and integrating
data from a huge amount of web sources retrieved by the existing
search engines. The semantic web technology enables advancement
in information extraction by providing a suite of tools to integrate
data from different sources. To take full advantage of semantic web,
it is necessary to annotate existing web pages into semantic web
pages. This research develops a tool, named OWIE (Ontology-based
Web Information Extraction), for semantic web annotation using
domain specific ontologies. The tool automatically extracts
information from html pages with the help of pre-defined ontologies
and gives them semantic representation. Two case studies have been
conducted to analyze the accuracy of OWIE.
Abstract: Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques to extract phrases
from documents and the query and doing stemming on words. Then
an ontology based conceptual method will be used to annotate
documents and expand the query. To expand a query the spread
activation algorithm is improved so that the expansion can be done
flexible and in various aspects. The annotated documents and the
expanded query will be processed to compute the relevance degree
exploiting statistical methods. The outstanding features of our
approach are (1) combining conceptual, statistical and linguistic
features of documents, (2) expanding the query with its related
concepts before comparing to documents, (3) extracting and using
both words and phrases to compute relevance degree, (4) improving
the spread activation algorithm to do the expansion based on
weighted combination of different conceptual relationships and (5)
allowing variable document vector dimensions. A ranking system
called ORank is developed to implement and test the proposed
model. The test results will be included at the end of the paper.
Abstract: Today’s technology is heavily dependent on web applications. Web applications are being accepted by users at a very rapid pace. These have made our work efficient. These include webmail, online retail sale, online gaming, wikis, departure and arrival of trains and flights and list is very long. These are developed in different languages like PHP, Python, C#, ASP.NET and many more by using scripts such as HTML and JavaScript. Attackers develop tools and techniques to exploit web applications and legitimate websites. This has led to rise of web application security; which can be broadly classified into Declarative Security and Program Security. The most common attacks on the applications are by SQL Injection and XSS which give access to unauthorized users who totally damage or destroy the system. This paper presents a detailed literature description and analysis on Web Application Security, examples of attacks and steps to mitigate the vulnerabilities.
Abstract: This paper presents a new approach for automatic
document categorization. Exploiting the logical structure of the
document, our approach assigns a HTML document to one or more
categories (thesis, paper, call for papers, email, ...). Using a set of
training documents, our approach generates a set of rules used to
categorize new documents. The approach flexibility is carried out
with rule weight association representing your importance in the
discrimination between possible categories. This weight is
dynamically modified at each new document categorization. The
experimentation of the proposed approach provides satisfactory
results.
Abstract: Phishing, or stealing of sensitive information on the
web, has dealt a major blow to Internet Security in recent times. Most
of the existing anti-phishing solutions fail to handle the fuzziness
involved in phish detection, thus leading to a large number of false
positives. This fuzziness is attributed to the use of highly flexible and
at the same time, highly ambiguous HTML language. We introduce a
new perspective against phishing, that tries to systematically prove,
whether a given page is phished or not, using the corresponding
original page as the basis of the comparison. It analyzes the layout of
the pages under consideration to determine the percentage distortion
between them, indicative of any form of malicious alteration. The
system design represents an intelligent system, employing dynamic
assessment which accurately identifies brand new phishing attacks
and will prove effective in reducing the number of false positives.
This framework could potentially be used as a knowledge base, in
educating the internet users against phishing.
Abstract: MicroRNAs (miRNAs) are a class of non-coding
RNAs that hybridize to mRNAs and induce either translation
repression or mRNA cleavage. Recently, it has been reported that
miRNAs could possibly play an important role in human diseases. By
integrating miRNA target genes, cancer genes, miRNA and mRNA
expression profiles information, a database is developed to link
miRNAs to cancer target genes. The database provides experimentally
verified human miRNA target genes information, including oncogenes
and tumor suppressor genes. In addition, fragile sites information for
miRNAs, and the strength of the correlation of miRNA and its target
mRNA expression level for nine tissue types are computed, which
serve as an indicator for suggesting miRNAs could play a role in
human cancer. The database is freely accessible at
http://ppi.bioinfo.asia.edu.tw/mirna_target/index.html.
Abstract: EGOTHOR is a search engine that indexes the Web
and allows us to search the Web documents. Its hit list contains URL
and title of the hits, and also some snippet which tries to shortly
show a match. The snippet can be almost always assembled by an
algorithm that has a full knowledge of the original document (mostly
HTML page). It implies that the search engine is required to store
the full text of the documents as a part of the index.
Such a requirement leads us to pick up an appropriate compression
algorithm which would reduce the space demand. One of the solutions
could be to use common compression methods, for instance gzip or
bzip2, but it might be preferable if we develop a new method which
would take advantage of the document structure, or rather, the textual
character of the documents.
There already exist a special compression text algorithms and
methods for a compression of XML documents. The aim of this
paper is an integration of the two approaches to achieve an optimal
level of the compression ratio
Abstract: The ever increasing use of World Wide Web in the
existing network, results in poor performance. Several techniques
have been developed for reducing web traffic by compressing the size
of the file, saving the web pages at the client side, changing the burst
nature of traffic into constant rate etc. No single method was
adequate enough to access the document instantly through the
Internet. In this paper, adaptive hybrid algorithms are developed for
reducing web traffic. Intelligent agents are used for monitoring the
web traffic. Depending upon the bandwidth usage, user-s preferences,
server and browser capabilities, intelligent agents use the best
techniques to achieve maximum traffic reduction. Web caching,
compression, filtering, optimization of HTML tags, and traffic
dispersion are incorporated into this adaptive selection. Using this
new hybrid technique, latency is reduced to 20 – 60 % and cache hit
ratio is increased 40 – 82 %.
Abstract: PPX(Pretty Printer for XML) is a query language that offers a concise description method of formatting the XML data into HTML. In this paper, we propose a simple specification of formatting method that is a combination description of automatic layout operators and variables in the layout expression of the GENERATE clause of PPX. This method can automatically format irregular XML data included in a part of XML with layout decision rule that is referred to DTD. In the experiment, a quick comparison shows that PPX requires far less description compared to XSLT or XQuery programs doing same tasks.
Abstract: In this paper, computational fluid dynamics (CFD) is utilized to characterize a prototype biolistic delivery system, the biomedical device based on the contoured-shock-tube design (CST), with the aim at investigating shocks induced flow instabilities within the contoured shock tube. The shock/interface interactions, the growth of perturbation at an interface between two fluids of different density are interrogated. The key features of the gas dynamics and gas-particle interaction are discussed