Abstract: This paper proposes an auto-classification algorithm
of Web pages using Data mining techniques. We consider the
problem of discovering association rules between terms in a set of
Web pages belonging to a category in a search engine database, and
present an auto-classification algorithm for solving this problem that
are fundamentally based on Apriori algorithm. The proposed
technique has two phases. The first phase is a training phase where
human experts determines the categories of different Web pages, and
the supervised Data mining algorithm will combine these categories
with appropriate weighted index terms according to the highest
supported rules among the most frequent words. The second phase is
the categorization phase where a web crawler will crawl through the
World Wide Web to build a database categorized according to the
result of the data mining approach. This database contains URLs and
their categories.
Abstract: An induced graphoidal cover of a graph G is a collection ψ of (not necessarily open) paths in G such that every path in ψ has at least two vertices, every vertex of G is an internal vertex of at most one path in ψ, every edge of G is in exactly one path in ψ and every member of ψ is an induced cycle or an induced path. The minimum cardinality of an induced graphoidal cover of G is called the induced graphoidal covering number of G and is denoted by ηi(G) or ηi. Here we find induced graphoidal cover for some classes of graphs.
Abstract: The paper presents the study of synthetic transmit
aperture method applying the Golay coded transmission for medical
ultrasound imaging. Longer coded excitation allows to increase the
total energy of the transmitted signal without increasing the peak
pressure. Signal-to-noise ratio and penetration depth are improved
maintaining high ultrasound image resolution.
In the work the 128-element linear transducer array with 0.3 mm
inter-element spacing excited by one cycle and the 8 and 16-bit
Golay coded sequences at nominal frequencies 4 MHz was used.
Single element transmission aperture was used to generate a spherical
wave covering the full image region and all the elements received the
echo signals. The comparison of 2D ultrasound images of the wire
phantom as well as of the tissue mimicking phantom is presented to
demonstrate the benefits of the coded transmission. The results were
obtained using the synthetic aperture algorithm with transmit and
receive signals correction based on a single element directivity
function.
Abstract: Recent medical studies have investigated the importance of enteral feeding and the use of feeding pumps for recovering patients unable to feed themselves or gain nourishment and nutrients by natural means. The most of enteral feeding system uses a peristaltic tube pump. A peristaltic pump is a form of positive displacement pump in which a flexible tube is progressively squeezed externally to allow the resulting enclosed pillow of fluid to progress along it. The squeezing of the tube requires a precise and robust controller of the geared motor to overcome parametric uncertainty of the pumping system which generates due to a wide variation of friction and slip between tube and roller. So, this paper proposes fuzzy adaptive controller for the robust control of the peristaltic tube pump. This new adaptive controller uses a fuzzy multi-layered architecture which has several independent fuzzy controllers in parallel, each with different robust stability area. Out of several independent fuzzy controllers, the most suited one is selected by a system identifier which observes variations in the controlled system parameter. This paper proposes a design procedure which can be carried out mathematically and systematically from the model of a controlled system. Finally, the good control performance, accurate dose rate and robust system stability, of the developed feeding pump is confirmed through experimental and clinic testing.
Abstract: Set covering problem is a classical problem in
computer science and complexity theory. It has many applications,
such as airline crew scheduling problem, facilities location problem,
vehicle routing, assignment problem, etc. In this paper, three
different techniques are applied to solve set covering problem.
Firstly, a mathematical model of set covering problem is introduced
and solved by using optimization solver, LINGO. Secondly, the
Genetic Algorithm Toolbox available in MATLAB is used to solve
set covering problem. And lastly, an ant colony optimization method
is programmed in MATLAB programming language. Results
obtained from these methods are presented in tables. In order to
assess the performance of the techniques used in this project, the
benchmark problems available in open literature are used.
Abstract: Reverse Engineering is a very important process in
Software Engineering. It can be performed backwards from system
development life cycle (SDLC) in order to get back the source data
or representations of a system through analysis of its structure,
function and operation. We use reverse engineering to introduce an
automatic tool to generate system requirements from its program
source codes. The tool is able to accept the Cµ programming source
codes, scan the source codes line by line and parse the codes to
parser. Then, the engine of the tool will be able to generate system
requirements for that specific program to facilitate reuse and
enhancement of the program. The purpose of producing the tool is to
help recovering the system requirements of any system when the
system requirements document (SRD) does not exist due to
undocumented support of the system.
Abstract: Data mining uses a variety of techniques each of which
is useful for some particular task. It is important to have a deep
understanding of each technique and be able to perform sophisticated
analysis. In this article we describe a tool built to simulate a variation
of the Kohonen network to perform unsupervised clustering and
support the entire data mining process up to results visualization. A
graphical representation helps the user to find out a strategy to
optimize classification by adding, moving or delete a neuron in order
to change the number of classes. The tool is able to automatically
suggest a strategy to optimize the number of classes optimization, but
also support both tree classifications and semi-lattice organizations of
the classes to give to the users the possibility of passing from one
class to the ones with which it has some aspects in common.
Examples of using tree and semi-lattice classifications are given to
illustrate advantages and problems. The tool is applied to classify
macroeconomic data that report the most developed countries- import
and export. It is possible to classify the countries based on their
economic behaviour and use the tool to characterize the commercial
behaviour of a country in a selected class from the analysis of
positive and negative features that contribute to classes formation.
Possible interrelationships between the classes and their meaning are
also discussed.
Abstract: Frequent pattern discovery over data stream is a hard
problem because a continuously generated nature of stream does not
allow a revisit on each data element. Furthermore, pattern discovery
process must be fast to produce timely results. Based on these
requirements, we propose an approximate approach to tackle the
problem of discovering frequent patterns over continuous stream.
Our approximation algorithm is intended to be applied to process a
stream prior to the pattern discovery process. The results of
approximate frequent pattern discovery have been reported in the
paper.
Abstract: In this paper, we present a method named Signal Level
Matrix (SLM) which can improve the accuracy and stability of active
RFID indoor positioning system. Considering the accuracy and cost,
we use uniform distribution mode to set up and separate the
overlapped signal covering areas, in order to achieve preliminary
location setting. Then, based on the proposed SLM concept and the
characteristic of the signal strength value that attenuates as the
distance increases, this system cross-examines the distribution of
adjacent signals to locate the users more accurately. The experimental
results indicate that the adaptive positioning method proposed in this
paper could improve the accuracy and stability of the positioning
system effectively and satisfyingly.
Abstract: Recently, Northeast Asia has become one of the three
largest trade areas, covering approximately 30% of the total trade
volume of the world. However, the distribution facilities are saturated
due to the increase in the transportation volume within the area and
with the European countries. In order to accommodate the increase of
the transportation volume, the transportation networking with the
major countries in Northeast Asia and Europe is absolutely necessary.
The Eurasian Logistics Network will develop into an international
passenger transportation network covering the Northeast Asian region
and an international freight transportation network connecting across
Eurasia Continent. This paper surveys the changes and trend of the
distribution network in the Eurasian Region according to the political,
economic and environmental changes of the region, analyses the
distribution network according to the changes in the transportation
policies of the related countries, and provides the direction of the
development of composite transportation on the basis of the present
conditions of transportation means. The transportation means optimal
for the efficiency of transportation system are suggested to be train
ferries, sea & rail or sea & rail & sea. It is suggested to develop
diversified composite transportation means and routes within the
boundary of international cooperation system.
Abstract: Quantitative Structure-Activity Relationship (QSAR)
approach for discovering novel more active Calanone derivative as
anti-leukemia compound has been conducted. There are 6
experimental activities of Calanone compounds against leukemia cell
L1210 that are used as material of the research. Calculation of
theoretical predictors (independent variables) was performed by
AM1 semiempirical method. The QSAR equation is determined by
Principle Component Regression (PCR) analysis, with Log IC50 as
dependent variable and the independent variables are atomic net
charges, dipole moment (μ), and coefficient partition of noctanol/
water (Log P). Three novel Calanone derivatives that
obtained by this research have higher activity against leukemia cell
L1210 than pure Calanone.
Abstract: Covering-based rough sets is an extension of rough
sets and it is based on a covering instead of a partition of the
universe. Therefore it is more powerful in describing some practical
problems than rough sets. However, by extending the rough sets,
covering-based rough sets can increase the roughness of each model
in recognizing objects. How to obtain better approximations from
the models of a covering-based rough sets is an important issue.
In this paper, two concepts, determinate elements and indeterminate
elements in a universe, are proposed and given precise definitions
respectively. This research makes a reasonable refinement of the
covering-element from a new viewpoint. And the refinement may
generate better approximations of covering-based rough sets models.
To prove the theory above, it is applied to eight major coveringbased
rough sets models which are adapted from other literature.
The result is, in all these models, the lower approximation increases
effectively. Correspondingly, in all models, the upper approximation
decreases with exceptions of two models in some special situations.
Therefore, the roughness of recognizing objects is reduced. This
research provides a new approach to the study and application of
covering-based rough sets.
Abstract: A large amount of valuable information is available in
plain text clinical reports. New techniques and technologies are
applied to extract information from these reports. In this study, we
developed a domain based software system to transform 600
Otorhinolaryngology discharge notes to a structured form for
extracting clinical data from the discharge notes. In order to decrease
the system process time discharge notes were transformed into a data
table after preprocessing. Several word lists were constituted to
identify common section in the discharge notes, including patient
history, age, problems, and diagnosis etc. N-gram method was used
for discovering terms co-Occurrences within each section. Using this
method a dataset of concept candidates has been generated for the
validation step, and then Predictive Apriori algorithm for Association
Rule Mining (ARM) was applied to validate candidate concepts.
Abstract: Achievement motivation is believed to promote
giftedness attracting people to invest in many programs to adopt
gifted students providing them with challenging activities.
Intellectual giftedness is founded on the fluid intelligence and
extends to more specific abilities through the growth and inputs from
the achievement motivation. Acknowledging the roles played by the
motivation in the development of giftedness leads to an effective
nurturing of gifted individuals. However, no study has investigated
the direct and indirect effects of the achievement motivation and
fluid intelligence on intellectual giftedness. Thus, this study
investigated the contribution of motivation factors to giftedness
development by conducting tests of fluid intelligence using Cattell
Culture Fair Test (CCFT) and analytical abilities using culture
reduced test items covering problem solving, pattern recognition,
audio-logic, audio-matrices, and artificial language, and self report
questionnaire for the motivational factors. A number of 180 highscoring
students were selected using CCFT from a leading university
in Malaysia. Structural equation modeling was employed using Amos
V.16 to determine the direct and indirect effects of achievement
motivation factors (self confidence, success, perseverance,
competition, autonomy, responsibility, ambition, and locus of
control) on the intellectual giftedness. The findings showed that the
hypothesized model fitted the data, supporting the model postulates
and showed significant and strong direct and indirect effects of the
motivation and fluid intelligence on the intellectual giftedness.
Abstract: Future space vehicles will require the use of non-toxic, cryogenic propellants, because of the performance advantages over the toxic hypergolic propellants and also because of the environmental and handling concerns. A prototypical capillary flow liquid acquisition device (LAD) for cryogenic propellants was fabricated with a mesh screen, covering a rectangular flow channel with a cylindrical outlet tube, and was tested with liquid oxygen (LOX). In order to better understand the performance in various gravity environments and orientations with different submersion depths of the LAD, a series of computational fluid dynamics (CFD) simulations of LOX flow through the LAD screen channel, including horizontally and vertically submersions of the LAD channel assembly at normal gravity environment was conducted. Gravity effects on the flow field in LAD channel are inspected and analyzed through comparing the simulations.
Abstract: This paper provides a framework in order to
incorporate reliability issue as a sign of disruption in distribution
systems and partial covering theory as a response to limitation in
coverage radios and economical preferences, simultaneously into the
traditional literatures of capacitated facility location problems. As a
result we develop a bi-objective model based on the discrete
scenarios for expected cost minimization and demands coverage
maximization through a three echelon supply chain network by
facilitating multi-capacity levels for provider side layers and
imposing gradual coverage function for distribution centers (DCs).
Additionally, in spite of objectives aggregation for solving the model
through LINGO software, a branch of LP-Metric method called Min-
Max approach is proposed and different aspects of corresponds
model will be explored.
Abstract: In this paper we propose a new knowledge model using
the Dempster-Shafer-s evidence theory for image segmentation and
fusion. The proposed method is composed essentially of two steps.
First, mass distributions in Dempster-Shafer theory are obtained from
the membership degrees of each pixel covering the three image
components (R, G and B). Each membership-s degree is determined by
applying Fuzzy C-Means (FCM) clustering to the gray levels of the
three images. Second, the fusion process consists in defining three
discernment frames which are associated with the three images to be
fused, and then combining them to form a new frame of discernment.
The strategy used to define mass distributions in the combined
framework is discussed in detail. The proposed fusion method is
illustrated in the context of image segmentation. Experimental
investigations and comparative studies with the other previous methods
are carried out showing thus the robustness and superiority of the
proposed method in terms of image segmentation.
Abstract: The main goal of data mining is to extract accurate, comprehensible and interesting knowledge from databases that may be considered as large search spaces. In this paper, a new, efficient type of Genetic Algorithm (GA) called uniform two-level GA is proposed as a search strategy to discover truly interesting, high-level prediction rules, a difficult problem and relatively little researched, rather than discovering classification knowledge as usual in the literatures. The proposed method uses the advantage of uniform population method and addresses the task of generalized rule induction that can be regarded as a generalization of the task of classification. Although the task of generalized rule induction requires a lot of computations, which is usually not satisfied with the normal algorithms, it was demonstrated that this method increased the performance of GAs and rapidly found interesting rules.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.