Abstract: In data mining, the association rules are used to search
for the relations of items of the transactions database. Following the
data is collected and stored, it can find rules of value through
association rules, and assist manager to proceed marketing strategy
and plan market framework. In this paper, we attempt fuzzy partition
methods and decide membership function of quantitative values of
each transaction item. Also, by managers we can reflect the
importance of items as linguistic terms, which are transformed as
fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth
(FWFP-Growth) is used to complete the process of data mining. The
method above is expected to improve Apriori algorithm for its better
efficiency of the whole association rules. An example is given to
clearly illustrate the proposed approach.
Abstract: HIV-1 genome is highly heterogeneous. Due to this
variation, features of HIV-I genome is in a wide range. For this
reason, the ability to infection of the virus changes depending on
different chemokine receptors. From this point of view, R5 HIV
viruses use CCR5 coreceptor while X4 viruses use CXCR5 and
R5X4 viruses can utilize both coreceptors. Recently, in
Bioinformatics, R5X4 viruses have been studied to classify by using
the experiments on HIV-1 genome.
In this study, R5X4 type of HIV viruses were classified using
Auto Regressive (AR) model through Artificial Neural Networks
(ANNs). The statistical data of R5X4, R5 and X4 viruses was
analyzed by using signal processing methods and ANNs. Accessible
residues of these virus sequences were obtained and modeled by AR
model since the dimension of residues is large and different from
each other. Finally the pre-processed data was used to evolve various
ANN structures for determining R5X4 viruses. Furthermore ROC
analysis was applied to ANNs to show their real performances. The
results indicate that R5X4 viruses successfully classified with high
sensitivity and specificity values training and testing ROC analysis
for RBF, which gives the best performance among ANN structures.
Abstract: In this paper, we consider a two-neuron system with time-delayed connections between neurons. By analyzing the associated characteristic transcendental equation, its linear stability is investigated and Hopf bifurcation is demonstrated. Some explicit formulae for determining the stability and the direction of the Hopf bifurcation periodic solutions bifurcating from Hopf bifurcations are obtained by using the normal form theory and center manifold theory. Some numerical simulation results are given to support the theoretical predictions. Finally, main conclusions are given.
Abstract: The various types of frequent pattern discovery
problem, namely, the frequent itemset, sequence and graph mining
problems are solved in different ways which are, however, in certain
aspects similar. The main approach of discovering such patterns can
be classified into two main classes, namely, in the class of the levelwise
methods and in that of the database projection-based methods.
The level-wise algorithms use in general clever indexing structures
for discovering the patterns. In this paper a new approach is proposed
for discovering frequent sequences and tree-like patterns efficiently
that is based on the level-wise issue. Because the level-wise
algorithms spend a lot of time for the subpattern testing problem, the
new approach introduces the idea of using automaton theory to solve
this problem.
Abstract: Ever since industrial revolution began, our ecosystem
has changed. And indeed, the negatives outweigh the positives.
Industrial waste usually released into all kinds of body of water, such
as river or sea. Tempeh waste is one example of waste that carries
many hazardous and unwanted substances that will affect the
surrounding environment. Tempeh is a popular fermented food in
Asia which is rich in nutrients and active substances. Tempeh liquid
waste- in particular- can cause an air pollution, and if penetrates
through the soil, it will contaminates ground-water, making it
unavailable for the water to be consumed. Moreover, bacteria will
thrive within the polluted water, which often responsible for causing
many kinds of diseases. The treatment used for this chemical waste is
biological treatment such as constructed wetland and activated
sludge. These kinds of treatment are able to reduce both physical and
chemical parameters altogether such as temperature, TSS, pH, BOD,
COD, NH3-N, NO3-N, and PO4-P. These treatments are implemented
before the waste is released into the water. The result is a
comparation between constructed wetland and activated sludge,
along with determining which method is better suited to reduce the
physical and chemical subtances of the waste.
Abstract: Due to the tremendous amount of information provided
by the World Wide Web (WWW) developing methods for mining
the structure of web-based documents is of considerable interest. In
this paper we present a similarity measure for graphs representing
web-based hypertext structures. Our similarity measure is mainly
based on a novel representation of a graph as linear integer strings,
whose components represent structural properties of the graph. The
similarity of two graphs is then defined as the optimal alignment of
the underlying property strings. In this paper we apply the well known
technique of sequence alignments for solving a novel and challenging
problem: Measuring the structural similarity of generalized trees.
In other words: We first transform our graphs considered as high
dimensional objects in linear structures. Then we derive similarity
values from the alignments of the property strings in order to
measure the structural similarity of generalized trees. Hence, we
transform a graph similarity problem to a string similarity problem for
developing a efficient graph similarity measure. We demonstrate that
our similarity measure captures important structural information by
applying it to two different test sets consisting of graphs representing
web-based document structures.
Abstract: As the number of networked computers grows,
intrusion detection is an essential component in keeping networks
secure. Various approaches for intrusion detection are currently
being in use with each one has its own merits and demerits. This
paper presents our work to test and improve the performance of a
new class of decision tree c-fuzzy decision tree to detect intrusion.
The work also includes identifying best candidate feature sub set to
build the efficient c-fuzzy decision tree based Intrusion Detection
System (IDS). We investigated the usefulness of c-fuzzy decision
tree for developing IDS with a data partition based on horizontal
fragmentation. Empirical results indicate the usefulness of our
approach in developing the efficient IDS.
Abstract: In this work, we study the problem of determining
the minimum scheduling length that can satisfy end-to-end (ETE)
traffic demand in scheduling-based multihop WSNs with cooperative
multiple-input multiple-output (MIMO) transmission scheme. Specifically,
we present a cross-layer formulation for the joint routing,
scheduling and stream control problem by incorporating various
power and rate adaptation schemes, and taking into account an
antenna beam pattern model and the signal-to-interference-and-noise
(SINR) constraint at the receiver. In the context, we also propose
column generation (CG) solutions to get rid of the complexity
requiring the enumeration of all possible sets of scheduling links.
Abstract: For a given specific problem an efficient algorithm has been the matter of study. However, an alternative approach orthogonal to this approach comes out, which is called a reduction. In general for a given specific problem this reduction approach studies how to convert an original problem into subproblems. This paper proposes a formal modeling language to support this reduction approach in order to make a solver quickly. We show three examples from the wide area of learning problems. The benefit is a fast prototyping of algorithms for a given new problem. It is noted that our formal modeling language is not intend for providing an efficient notation for data mining application, but for facilitating a designer who develops solvers in machine learning.
Abstract: This paper addresses the problem of determining the current 3D location of a moving object and robustly tracking it from a sequence of camera images. The approach presented here uses a particle filter and does not perform any explicit triangulation. Only the color of the object to be tracked is required, but not any precisemotion model. The observation model we have developed avoids the color filtering of the entire image. That and the Monte Carlotechniques inside the particle filter provide real time performance.Experiments with two real cameras are presented and lessons learned are commented. The approach scales easily to more than two cameras and new sensor cues.
Abstract: In this paper sensitivity analysis is performed for
reliability evaluation of power systems. When examining the
reliability of a system, it is useful to recognize how results
change as component parameters are varied. This knowledge
helps engineers to understand the impact of poor data, and
gives insight on how reliability can be improved. For these
reasons, a sensitivity analysis can be performed. Finally, a real
network was used for testing the presented method.
Abstract: Naïve Bayes classifiers are simple probabilistic
classifiers. Classification extracts patterns by using data file with a set
of labeled training examples and is currently one of the most
significant areas in data mining. However, Naïve Bayes assumes the
independence among the features. Structural learning among the
features thus helps in the classification problem. In this study, the use
of structural learning in Bayesian Network is proposed to be applied
where there are relationships between the features when using the
Naïve Bayes. The improvement in the classification using structural
learning is shown if there exist relationship between the features or
when they are not independent.
Abstract: Data mining has been used very frequently to extract
hidden information from large databases. This paper suggests the use
of decision trees for continuously extracting the clinical reasoning in
the form of medical expert-s actions that is inherent in large number
of EMRs (Electronic Medical records). In this way the extracted data
could be used to teach students of oral medicine a number of orderly
processes for dealing with patients who represent with different
problems within the practice context over time.
Abstract: The objective of this paper is to review and assess the
methodological issues and problems in marketing research, data and
knowledge mining in Turkey. As a summary, academic marketing
research publications in Turkey have significant problems. The most
vital problem seems to be related with modeling. Most of the
publications had major weaknesses in modeling. There were also,
serious problems regarding measurement and scaling, sampling and
analyses. Analyses myopia seems to be the most important problem
for young academia in Turkey. Another very important finding is the
lack of publications on data and knowledge mining in the academic
world.
Abstract: The purpose of the article is to illustrate the main
characteristics of the corporate governance challenge facing the
countries of South-Eastern Europe (SEE) and to subsequently
determine and assess the extensiveness and effectiveness of corporate
governance regulations in these countries. Therefore, we start with an
overview on the subject of the key problems of corporate governance
in transition. We then address the issue of corporate governance
measurement for SEE countries. To this end, we include a review of
the methodological framework for determining both the
extensiveness and the effectiveness of corporate governance
legislation. We then focus on the actual analysis of the quality of
corporate governance codes, as well as of legal institutions
effectiveness and provide a measure of corporate governance in
Romania and other SEE emerging markets. The paper concludes by
emphasizing the corporate governance enforcement gap and by
identifying research issues that require further study.
Abstract: Decision tree algorithms have very important place at
classification model of data mining. In literature, algorithms use
entropy concept or gini index to form the tree. The shape of the
classes and their closeness to each other some of the factors that
affect the performance of the algorithm. In this paper we introduce a
new decision tree algorithm which employs data (attribute) folding
method and variation of the class variables over the branches to be
created. A comparative performance analysis has been held between
the proposed algorithm and C4.5.
Abstract: Determining depth of anesthesia is a challenging problem
in the context of biomedical signal processing. Various methods
have been suggested to determine a quantitative index as depth of
anesthesia, but most of these methods suffer from high sensitivity
during the surgery. A novel method based on energy scattering of
samples in the wavelet domain is suggested to represent the basic
content of electroencephalogram (EEG) signal. In this method, first
EEG signal is decomposed into different sub-bands, then samples
are squared and energy of samples sequence is constructed through
each scale and time, which is normalized and finally entropy of the
resulted sequences is suggested as a reliable index. Empirical Results
showed that applying the proposed method to the EEG signals can
classify the awake, moderate and deep anesthesia states similar to
BIS.
Abstract: Entrepreneurs are important for national labour markets and economies in that they contribute significantly to economic growth as well as provide the majority of jobs and create new ones. According to the Global Entrepreneurship Monitor’s “Report on Women and Entrepreneurship”, investment in women’s entrepreneurship is an important way to exponentially increase the impact of new venture creation finding ways to empower women’s participation and success in entrepreneurship are critical for more sustainable and successful economic development. Our results confirm that they are still differences between men and women entrepreneurs The reasons seems to be the lack of specific business skills, the less extensive social network, and the lack of identification patterns among women. Those differences can be explained by the fact that women still have fewer opportunities to make a career. If this is correct, we can predict an increasing proportion of women among entrepreneurs in the next years. Concerning the development of a favorable environment for developing and enhancing women entrepreneurship activities, our results show the insertion in a network and the role of a model doubtless represent elements determining in the choice to launch an entrepreneurship activity, as well as a precious resource for the success of her company.
Abstract: With a surge of stream processing applications novel
techniques are required for generation and analysis of association
rules in streams. The traditional rule mining solutions cannot handle
streams because they generally require multiple passes over the data
and do not guarantee the results in a predictable, small time. Though
researchers have been proposing algorithms for generation of rules
from streams, there has not been much focus on their analysis.
We propose Association rule profiling, a user centric process for
analyzing association rules and attaching suitable profiles to them
depending on their changing frequency behavior over a previous
snapshot of time in a data stream.
Association rule profiles provide insights into the changing nature
of associations and can be used to characterize the associations. We
discuss importance of characteristics such as predictability of
linkages present in the data and propose metric to quantify it. We
also show how association rule profiles can aid in generation of user
specific, more understandable and actionable rules.
The framework is implemented as SUPAR: System for Usercentric
Profiling of Association Rules in streaming data. The
proposed system offers following capabilities:
i) Continuous monitoring of frequency of streaming item-sets
and detection of significant changes therein for association rule
profiling.
ii) Computation of metrics for quantifying predictability of
associations present in the data.
iii) User-centric control of the characterization process: user
can control the framework through a) constraint specification and b)
non-interesting rule elimination.
Abstract: One main drawback of intrusion detection system is the
inability of detecting new attacks which do not have known
signatures. In this paper we discuss an intrusion detection method
that proposes independent component analysis (ICA) based feature
selection heuristics and using rough fuzzy for clustering data. ICA is
to separate these independent components (ICs) from the monitored
variables. Rough set has to decrease the amount of data and get rid of
redundancy and Fuzzy methods allow objects to belong to several
clusters simultaneously, with different degrees of membership. Our
approach allows us to recognize not only known attacks but also to
detect activity that may be the result of a new, unknown attack. The
experimental results on Knowledge Discovery and Data Mining-
(KDDCup 1999) dataset.