Abstract: Cluster analysis divides data into groups that are
meaningful, useful, or both. Analysis of biological data is creating a
new generation of epidemiologic, prognostic, diagnostic and
treatment modalities. Clustering of protein sequences is one of the
current research topics in the field of computer science. Linear
relation is valuable in rule discovery for a given data, such as if value
X goes up 1, value Y will go down 3", etc. The classical linear
regression models the linear relation of two sequences perfectly.
However, if we need to cluster a large repository of protein sequences
into groups where sequences have strong linear relationship with
each other, it is prohibitively expensive to compare sequences one by
one. In this paper, we propose a new technique named General
Regression Model Technique Clustering Algorithm (GRMTCA) to
benignly handle the problem of linear sequences clustering. GRMT
gives a measure, GR*, to tell the degree of linearity of multiple
sequences without having to compare each pair of them.
Abstract: The purpose of semantic web research is to transform
the Web from a linked document repository into a distributed knowledge base and application platform, thus allowing the vast range of available information and services to be more efficiently
exploited. As a first step in this transformation, languages such as
OWL have been developed. Although fully realizing the Semantic Web still seems some way off, OWL has already been very
successful and has rapidly become a defacto standard for ontology
development in fields as diverse as geography, geology, astronomy,
agriculture, defence and the life sciences. The aim of this paper is to classify key concepts of Semantic Web as well as introducing a new
practical approach which uses these concepts to outperform Word Wide Web.
Abstract: This paper describes the NEAR (Navigating Exhibitions, Annotations and Resources) panel, a novel interactive visualization technique designed to help people navigate and interpret groups of resources, exhibitions and annotations by revealing hidden relations such as similarities and references. NEAR is implemented on A•VI•RE, an extended online information repository. A•VI•RE supports a semi-structured collection of exhibitions containing various resources and annotations. Users are encouraged to contribute, share, annotate and interpret resources in the system by building their own exhibitions and annotations. However, it is hard to navigate smoothly and efficiently in A•VI•RE because of its high capacity and complexity. We present a visual panel that implements new navigation and communication approaches that support discovery of implied relations. By quickly scanning and interacting with NEAR, users can see not only implied relations but also potential connections among different data elements. NEAR was tested by several users in the A•VI•RE system and shown to be a supportive navigation tool. In the paper, we further analyze the design, report the evaluation and consider its usage in other applications.
Abstract: In the current Grid environment, efficient workload
management presents a significant challenge, for which there are
exorbitant de facto standards encompassing resource discovery,
brokerage, and data transfer, among others. In addition, the real-time
resource status, essential for an optimal resource allocation strategy,
is often not readily accessible. To address these issues and provide a
cleaner abstraction of the Grid with the potential of generalizing into
arbitrary resource-sharing environment, this paper proposes a new
Condor-based pilot mechanism applied in the PanDA architecture,
PanDA-PF WMS, with the goal of providing a more generic yet
efficient resource allocating strategy. In this architecture, the PanDA
server primarily acts as a repository of user jobs, responding to pilot
requests from distributed, remote resources. Scheduling decisions are
subsequently made according to the real-time resource information
reported by pilots. Pilot Factory is a Condor-inspired solution for a
scalable pilot dissemination and effectively functions as a resource
provisioning mechanism through which the user-job server, PanDA,
reaches out to the candidate resources only on demand.
Abstract: K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency of attribute values in the cluster and in the data set. The new weighted measure is experimented with the data sets obtained from the UCI data repository. The results are compared with K-Modes and K-representative, which show that the new measure generates clusters with high purity.
Abstract: Web services provide significant new benefits for SOAbased
applications, but they also expose significant new security
risks. There are huge number of WS security standards and
processes. At present, there is still a lack of a comprehensive
approach which offers a methodical development in the construction
of secure WS-based SOA. Thus, the main objective of this paper is
to address this needs, presenting a comprehensive method for Web
Services Security guaranty in SOA. The proposed method defines
three stages, Initial Security Analysis, Architectural Security
Guaranty and WS Security Standards Identification. These facilitate,
respectively, the definition and analysis of WS-specific security
requirements, the development of a WS-based security architecture
and the identification of the related WS security standards that the
security architecture must articulate in order to implement the
security services.
Abstract: In this work a new platform for mobile-health systems is
presented. System target application is providing decision support to
rescue corps or military medical personnel in combat areas. Software
architecture relies on a distributed client-server system that manages a
wireless ad-hoc networks hierarchy in which several different types of
client operate. Each client is characterized for different hardware and
software requirements. Lower hierarchy levels rely in a network of
completely custom devices that store clinical information and patient
status and are designed to form an ad-hoc network operating in the
2.4 GHz ISM band and complying with the IEEE 802.15.4 standard
(ZigBee). Medical personnel may interact with such devices, that are
called MICs (Medical Information Carriers), by means of a PDA
(Personal Digital Assistant) or a MDA (Medical Digital Assistant),
and transmit the information stored in their local databases as well as
issue a service request to the upper hierarchy levels by using IEEE
802.11 a/b/g standard (WiFi). The server acts as a repository that
stores both medical evacuation forms and associated events (e.g., a
teleconsulting request). All the actors participating in the diagnostic
or evacuation process may access asynchronously to such repository
and update its content or generate new events. The designed system
pretends to optimise and improve information spreading and flow
among all the system components with the aim of improving both
diagnostic quality and evacuation process.
Abstract: In this paper a combined feature selection method is
proposed which takes advantages of sample domain filtering,
resampling and feature subset evaluation methods to reduce
dimensions of huge datasets and select reliable features. This method
utilizes both feature space and sample domain to improve the process
of feature selection and uses a combination of Chi squared with
Consistency attribute evaluation methods to seek reliable features.
This method consists of two phases. The first phase filters and
resamples the sample domain and the second phase adopts a hybrid
procedure to find the optimal feature space by applying Chi squared,
Consistency subset evaluation methods and genetic search.
Experiments on various sized datasets from UCI Repository of
Machine Learning databases show that the performance of five
classifiers (Naïve Bayes, Logistic, Multilayer Perceptron, Best First
Decision Tree and JRIP) improves simultaneously and the
classification error for these classifiers decreases considerably. The
experiments also show that this method outperforms other feature
selection methods.
Abstract: We try to give a solution of version control for
documents in web service, that-s why we propose a new approach
used specially for the XML documents. The new approach is applied
in a centralized repository, this repository coexist with other
repositories in a decentralized system. To achieve the activities of
this approach in a standard model we use the ECA active rules. We
also show how the Event-Condition-Action rules (ECA rules) have
been incorporated as a mechanism for the version control of
documents. The need to integrate ECA rules is that it provides a clear
declarative semantics and induces an immediate operational
realization in the system without the need for human intervention.
Abstract: Currently WWW is the first solution for scholars in
finding information. But, analyzing and interpreting this volume of
information will lead to researchers overload in pursuing their
research.
Trend detection in scientific publication retrieval systems helps
scholars to find relevant, new and popular special areas by
visualizing the trend of input topic.
However, there are few researches on trend detection in scientific
corpora while their proposed models do not appear to be suitable.
Previous works lack of an appropriate representation scheme for
research topics.
This paper describes a method that combines Semantic Web and
ontology to support advance search functions such as trend detection
in the context of scholarly Semantic Web system (SSWeb).
Abstract: The most reliable and accurate description of the actual behavior of a software system is its source code. However, not all questions about the system can be answered directly by resorting to this repository of information. What the reverse engineering methodology aims at is the extraction of abstract, goal-oriented “views" of the system, able to summarize relevant properties of the computation performed by the program. While concentrating on reverse engineering we had modeled the C++ files by designing the translator.
Abstract: A Data Warehouses is a repository of information
integrated from source data. Information stored in data warehouse is
the form of materialized in order to provide the better performance
for answering the queries. Deciding which appropriated views to be
materialized is one of important problem. In order to achieve this
requirement, the constructing search space close to optimal is a
necessary task. It will provide effective result for selecting view to be
materialized. In this paper we have proposed an approach to reoptimize
Multiple View Processing Plan (MVPP) by using global
common subexpressions. The merged queries which have query
processing cost not close to optimal would be rewritten. The
experiment shows that our approach can help to improve the total
query processing cost of MVPP and sum of query processing cost
and materialized view maintenance cost is reduced as well after views
are selected to be materialized.
Abstract: With the rapid growth in business size, today's businesses orient towards electronic technologies. Amazon.com and e-bay.com are some of the major stakeholders in this regard. Unfortunately the enormous size and hugely unstructured data on the web, even for a single commodity, has become a cause of ambiguity for consumers. Extracting valuable information from such an everincreasing data is an extremely tedious task and is fast becoming critical towards the success of businesses. Web content mining can play a major role in solving these issues. It involves using efficient algorithmic techniques to search and retrieve the desired information from a seemingly impossible to search unstructured data on the Internet. Application of web content mining can be very encouraging in the areas of Customer Relations Modeling, billing records, logistics investigations, product cataloguing and quality management. In this paper we present a review of some very interesting, efficient yet implementable techniques from the field of web content mining and study their impact in the area specific to business user needs focusing both on the customer as well as the producer. The techniques we would be reviewing include, mining by developing a knowledge-base repository of the domain, iterative refinement of user queries for personalized search, using a graphbased approach for the development of a web-crawler and filtering information for personalized search using website captions. These techniques have been analyzed and compared on the basis of their execution time and relevance of the result they produced against a particular search.
Abstract: This paper presents initiatives of Knowledge
Management (KM) applied to Forensic Sciences field, especially
developed at the Forensic Science Institute of the Brazilian Federal
Police. Successful projects, related to knowledge sharing, drugs
analysis and environmental crimes, are reported in the KM
perspective. The described results are related to: a) the importance of
having an information repository, like a digital library, in such a
multidisciplinary organization; b) the fight against drug dealing and
environmental crimes, enabling the possibility to map the evolution
of crimes, drug trafficking flows, and the advance of deforestation in
Amazon rain forest. Perspectives of new KM projects under
development and studies are also presented, tracing an evolution line
of the KM view at the Forensic Science Institute.
Abstract: Transferring information developed by other peoples is an ordinary event that happens during daily conversations, for example when employees sea each other in the organization, or when they are having lunch together, or attending a meeting, they use to talk about their experience, and discuss about their current projects, and talk about their successes over some specific problems. Despite the potential value of leveraging organizational memory and expertise by using OMS and ER, still small organizations haven-t been able to capitalize on its promised value. Each organization has its internal knowledge management system, in some of organizations the system face the lack of expert people to save their experience in the repository and in another hand on some other organizations there are lots of expert people but the organization doesn-t have the maximum use of their knowledge.
Abstract: Software crisis refers to the situation in which the developers are not able to complete the projects within time and budget constraints and moreover these overscheduled and over budget projects are of low quality as well. Several methodologies have been adopted form time to time to overcome this situation and now in the focus is component based software engineering. In this approach, emphasis is on reuse of already existing software artifacts. But the results can not be achieved just by preaching the principles; they need to be practiced as well. This paper highlights some of the very basic elements of this approach, which has to be in place to get the desired goals of high quality, low cost with shorter time-to-market software products.
Abstract: Tool Tracker is a client-server based application. It is essentially a catalogue of various network monitoring and management tools that are available online. There is a database maintained on the server side that contains the information about various tools. Several clients can access this information simultaneously and utilize this information. The various categories of tools considered are packet sniffers, port mappers, port scanners, encryption tools, and vulnerability scanners etc for the development of this application. This application provides a front end through which the user can invoke any tool from a central repository for the purpose of packet sniffing, port scanning, network analysis etc. Apart from the tool, its description and the help files associated with it would also be stored in the central repository. This facility will enable the user to view the documentation pertaining to the tool without having to download and install the tool. The application would update the central repository with the latest versions of the tools. The application would inform the user about the availability of a newer version of the tool currently being used and give the choice of installing the newer version to the user. Thus ToolTracker provides any network administrator that much needed abstraction and ease-ofuse with respect to the tools that he can use to efficiently monitor a network.
Abstract: This article outlines conceptualization and
implementation of an intelligent system capable of extracting
knowledge from databases. Use of hybridized features of both the
Rough and Fuzzy Set theory render the developed system flexibility
in dealing with discreet as well as continuous datasets. A raw data set
provided to the system, is initially transformed in a computer legible
format followed by pruning of the data set. The refined data set is
then processed through various Rough Set operators which enable
discovery of parameter relationships and interdependencies. The
discovered knowledge is automatically transformed into a rule base
expressed in Fuzzy terms. Two exemplary cancer repository datasets
(for Breast and Lung Cancer) have been used to test and implement
the proposed framework.
Abstract: Nowadays, organizing a repository of documents and
resources for learning on a special field as Information Technology
(IT), together with search techniques based on domain knowledge or
document-s content is an urgent need in practice of teaching, learning
and researching. There have been several works related to methods of
organization and search by content. However, the results are still
limited and insufficient to meet user-s demand for semantic
document retrieval. This paper presents a solution for the
organization of a repository that supports semantic representation and
processing in search. The proposed solution is a model which
integrates components such as an ontology describing domain
knowledge, a database of document repository, semantic
representation for documents and a file system; with problems,
semantic processing techniques and advanced search techniques
based on measuring semantic similarity. The solution is applied to
build a IT learning materials management system of a university with
semantic search function serving students, teachers, and manager as
well. The application has been implemented, tested at the University
of Information Technology, Ho Chi Minh City, Vietnam and has
achieved good results.