Abstract: The aim of this paper is to propose a general
framework for storing, analyzing, and extracting knowledge from
two-dimensional echocardiographic images, color Doppler images,
non-medical images, and general data sets. A number of high
performance data mining algorithms have been used to carry out this
task. Our framework encompasses four layers namely physical
storage, object identification, knowledge discovery, user level.
Techniques such as active contour model to identify the cardiac
chambers, pixel classification to segment the color Doppler echo
image, universal model for image retrieval, Bayesian method for
classification, parallel algorithms for image segmentation, etc., were
employed. Using the feature vector database that have been
efficiently constructed, one can perform various data mining tasks
like clustering, classification, etc. with efficient algorithms along
with image mining given a query image. All these facilities are
included in the framework that is supported by state-of-the-art user
interface (UI). The algorithms were tested with actual patient data
and Coral image database and the results show that their performance
is better than the results reported already.
Abstract: A seizure prediction method is proposed by extracting
global features using phase correlation between adjacent epochs for
detecting relative changes and local features using fluctuation/
deviation within an epoch for determining fine changes of different
EEG signals. A classifier and a regularization technique are applied
for the reduction of false alarms and improvement of the overall
prediction accuracy. The experiments show that the proposed method
outperforms the state-of-the-art methods and provides high prediction
accuracy (i.e., 97.70%) with low false alarm using EEG signals in
different brain locations from a benchmark data set.
Abstract: In this paper, we propose the variational EM inference
algorithm for the multi-class Gaussian process classification model
that can be used in the field of human behavior recognition. This
algorithm can drive simultaneously both a posterior distribution of a
latent function and estimators of hyper-parameters in a Gaussian
process classification model with multiclass. Our algorithm is based
on the Laplace approximation (LA) technique and variational EM
framework. This is performed in two steps: called expectation and
maximization steps. First, in the expectation step, using the Bayesian
formula and LA technique, we derive approximately the posterior
distribution of the latent function indicating the possibility that each
observation belongs to a certain class in the Gaussian process
classification model. Second, in the maximization step, using a derived
posterior distribution of latent function, we compute the maximum
likelihood estimator for hyper-parameters of a covariance matrix
necessary to define prior distribution for latent function. These two
steps iteratively repeat until a convergence condition satisfies.
Moreover, we apply the proposed algorithm with human action
classification problem using a public database, namely, the KTH
human action data set. Experimental results reveal that the proposed
algorithm shows good performance on this data set.
Abstract: Underwater acoustic networks have attracted great
attention in the last few years because of its numerous applications.
High data rate can be achieved by efficiently modeling the physical
layer in the network protocol stack. In Acoustic medium,
propagation speed of the acoustic waves is dependent on many
parameters such as temperature, salinity, density, and depth.
Acoustic propagation speed cannot be modeled using standard
empirical formulas such as Urick and Thorp descriptions. In this
paper, we have modeled the acoustic channel using real time data of
temperature, salinity, and speed of Bay of Bengal (Indian Coastal
Region). We have modeled the acoustic channel by using Mackenzie
speed equation and real time data obtained from National Institute of
Oceanography and Technology. It is found that acoustic propagation
speed varies between 1503 m/s to 1544 m/s as temperature and
depth differs. The simulation results show that temperature, salinity,
depth plays major role in acoustic propagation and data rate
increases with appropriate data sets substituted in the simulated
model.
Abstract: Presently various computational techniques are used
in modeling and analyzing environmental engineering data. In the
present study, an intra-comparison of polynomial and radial basis
kernel functions based on Support Vector Regression and, in turn, an
inter-comparison with Multi Linear Regression has been attempted in
modeling mass transfer capacity of vertical (θ = 90O) and inclined (θ
multiple plunging jets (varying from 1 to 16 numbers). The data set
used in this study consists of four input parameters with a total of
eighty eight cases, forty four each for vertical and inclined multiple
plunging jets. For testing, tenfold cross validation was used.
Correlation coefficient values of 0.971 and 0.981 along with
corresponding root mean square error values of 0.0025 and 0.0020
were achieved by using polynomial and radial basis kernel functions
based Support Vector Regression respectively. An intra-comparison
suggests improved performance by radial basis function in
comparison to polynomial kernel based Support Vector Regression.
Further, an inter-comparison with Multi Linear Regression
(correlation coefficient = 0.973 and root mean square error = 0.0024)
reveals that radial basis kernel functions based Support Vector
Regression performs better in modeling and estimating mass transfer
by multiple plunging jets.
Abstract: People, throughout the history, have made estimates
and inferences about the future by using their past experiences.
Developing information technologies and the improvements in the
database management systems make it possible to extract useful
information from knowledge in hand for the strategic decisions.
Therefore, different methods have been developed. Data mining by
association rules learning is one of such methods. Apriori algorithm,
one of the well-known association rules learning algorithms, is not
commonly used in spatio-temporal data sets. However, it is possible
to embed time and space features into the data sets and make Apriori
algorithm a suitable data mining technique for learning spatiotemporal
association rules. Lake Van, the largest lake of Turkey, is a
closed basin. This feature causes the volume of the lake to increase or
decrease as a result of change in water amount it holds. In this study,
evaporation, humidity, lake altitude, amount of rainfall and
temperature parameters recorded in Lake Van region throughout the
years are used by the Apriori algorithm and a spatio-temporal data
mining application is developed to identify overflows and newlyformed
soil regions (underflows) occurring in the coastal parts of
Lake Van. Identifying possible reasons of overflows and underflows
may be used to alert the experts to take precautions and make the
necessary investments.
Abstract: Optic disk segmentation plays a key role in the mass
screening of individuals with diabetic retinopathy and glaucoma
ailments. An efficient hardware-based algorithm for optic disk
localization and segmentation would aid for developing an automated
retinal image analysis system for real time applications. Herein,
TMS320C6416DSK DSP board pixel intensity based fractal analysis
algorithm for an automatic localization and segmentation of the optic
disk is reported. The experiment has been performed on color and
fluorescent angiography retinal fundus images. Initially, the images
were pre-processed to reduce the noise and enhance the quality. The
retinal vascular tree of the image was then extracted using canny
edge detection technique. Finally, a pixel intensity based fractal
analysis is performed to segment the optic disk by tracing the origin
of the vascular tree. The proposed method is examined on three
publicly available data sets of the retinal image and also with the data
set obtained from an eye clinic. The average accuracy achieved is
96.2%. To the best of the knowledge, this is the first work reporting
the use of TMS320C6416DSK DSP board and pixel intensity based
fractal analysis algorithm for an automatic localization and
segmentation of the optic disk. This will pave the way for developing
devices for detection of retinal diseases in the future.
Abstract: Research Objectives: The roles and activities of
Human Resource Management (HRM) have changed a lot in the past
years. Driven by a changing environment and therefore new business
requirements, the scope of human resource (HR) activities has
widened. The extent to which these activities should focus on
strategic issues to support the long term success of a company has
been discussed in science for many years. As many economies of
Central and Eastern Europe (CEE) experienced a phase of transition
after the socialist era and are now recovering from the 2008 global
crisis it is needed to examine the current state of HR positioning.
Furthermore a trend in HR work developing from rather
administrative units to being strategic partners of management can be
noticed. This leads to the question of better understanding the
underlying competencies which are necessary to support
organisations. This topic was addressed by the international study
“HR Competencies in international comparison”. The quantitative
survey was conducted by the Institute for Human Resources &
Organisation of FHWien University of Applied Science of WKW (A)
in cooperation with partner universities in the countries Bosnia-
Herzegovina, Croatia, Serbia and Slovenia. Methodology: Using the
questionnaire developed by Dave Ulrich we tested whether the HR
Competency model can be used for Austria, Bosnia and Herzegovina,
Croatia, Serbia and Slovenia. After performing confirmatory and
exploratory factor analysis for the whole data set containing all five
countries we could clearly distinguish between four competencies. In
a further step our analysis focused on median and average
comparisons between the HR competency dimensions. Conclusion:
Our literature review, in alignment with other studies, shows a
relatively rapid pace of development of HR Roles and HR
Competencies in BCSS in the past decades. Comparing data from
BCSS and Austria we still can notice that regards strategic orientation
there is a lack in BCSS countries, thus competencies are not as
developed as in Austria. This leads us to the tentative conclusion that
HR has undergone a rapid change but is still in a State of Transition
from being a rather administrative unit to performing the role of a
strategic partner.
Abstract: STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to real-world data
Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: Modeling and forecasting dynamics of rainfall
occurrences constitute one of the major topics, which have been
largely treated by statisticians, hydrologists, climatologists and many
other groups of scientists. In the same issue, we propose, in the
present paper, a new hybrid method, which combines Extreme
Values and fractal theories. We illustrate the use of our methodology
for transformed Emberger Index series, constructed basing on data
recorded in Oujda (Morocco).
The index is treated at first by Peaks Over Threshold (POT)
approach, to identify excess observations over an optimal threshold u.
In the second step, we consider the resulting excess as a fractal object
included in one dimensional space of time. We identify fractal
dimension by the box counting. We discuss the prospect descriptions
of rainfall data sets under Generalized Pareto Distribution, assured by
Extreme Values Theory (EVT). We show that, despite of the
appropriateness of return periods given by POT approach, the
introduction of fractal dimension provides accurate interpretation
results, which can ameliorate apprehension of rainfall occurrences.
Abstract: Data mining idea is mounting rapidly in admiration
and also in their popularity. The foremost aspire of data mining
method is to extract data from a huge data set into several forms that
could be comprehended for additional use. The data mining is a
technology that contains with rich potential resources which could be
supportive for industries and businesses that pay attention to collect
the necessary information of the data to discover their customer’s
performances. For extracting data there are several methods are
available such as Classification, Clustering, Association,
Discovering, and Visualization… etc., which has its individual and
diverse algorithms towards the effort to fit an appropriate model to
the data. STATISTICA mostly deals with excessive groups of data
that imposes vast rigorous computational constraints. These results
trials challenge cause the emergence of powerful STATISTICA Data
Mining technologies. In this survey an overview of the STATISTICA
software is illustrated along with their significant features.
Abstract: The effects of hypertension are often lethal thus its
early detection and prevention is very important for everybody. In
this paper, a neural network (NN) model was developed and trained
based on a dataset of hypertension causative parameters in order to
forecast the likelihood of occurrence of hypertension in patients. Our
research goal was to analyze the potential of the presented NN to
predict, for a period of time, the risk of hypertension or the risk of
developing this disease for patients that are or not currently
hypertensive. The results of the analysis for a given patient can
support doctors in taking pro-active measures for averting the
occurrence of hypertension such as recommendations regarding the
patient behavior in order to lower his hypertension risk. Moreover,
the paper envisages a set of three example scenarios in order to
determine the age when the patient becomes hypertensive, i.e.
determine the threshold for hypertensive age, to analyze what
happens if the threshold hypertensive age is set to a certain age and
the weight of the patient if being varied, and, to set the ideal weight
for the patient and analyze what happens with the threshold of
hypertensive age.
Abstract: The Cone Penetration Test (CPT) is a common in-situ
test which generally investigates a much greater volume of soil more
quickly than possible from sampling and laboratory tests. Therefore,
it has the potential to realize both cost savings and assessment of soil
properties rapidly and continuously. The principle objective of this
paper is to demonstrate the feasibility and efficiency of using
artificial neural networks (ANNs) to predict the soil angle of internal
friction (Φ) and the soil modulus of elasticity (E) from CPT results
considering the uncertainties and non-linearities of the soil. In
addition, ANNs are used to study the influence of different
parameters and recommend which parameters should be included as
input parameters to improve the prediction. Neural networks discover
relationships in the input data sets through the iterative presentation
of the data and intrinsic mapping characteristics of neural topologies.
General Regression Neural Network (GRNN) is one of the powerful
neural network architectures which is utilized in this study. A large
amount of field and experimental data including CPT results, plate
load tests, direct shear box, grain size distribution and calculated data
of overburden pressure was obtained from a large project in the
United Arab Emirates. This data was used for the training and the
validation of the neural network. A comparison was made between
the obtained results from the ANN's approach, and some common
traditional correlations that predict Φ and E from CPT results with
respect to the actual results of the collected data. The results show
that the ANN is a very powerful tool. Very good agreement was
obtained between estimated results from ANN and actual measured
results with comparison to other correlations available in the
literature. The study recommends some easily available parameters
that should be included in the estimation of the soil properties to
improve the prediction models. It is shown that the use of friction
ration in the estimation of Φ and the use of fines content in the
estimation of E considerable improve the prediction models.
Abstract: Estimation of model parameters is necessary to predict
the behavior of a system. Model parameters are estimated using
optimization criteria. Most algorithms use historical data to estimate
model parameters. The known target values (actual) and the output
produced by the model are compared. The differences between the
two form the basis to estimate the parameters. In order to compare
different models developed using the same data different criteria are
used. The data obtained for short scale projects are used here. We
consider software effort estimation problem using radial basis
function network. The accuracy comparison is made using various
existing criteria for one and two predictors. Then, we propose a new
criterion based on linear least squares for evaluation and compared
the results of one and two predictors. We have considered another
data set and evaluated prediction accuracy using the new criterion.
The new criterion is easy to comprehend compared to single statistic.
Although software effort estimation is considered, this method is
applicable for any modeling and prediction.
Abstract: Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.
Abstract: The development, operation and maintenance of
Integrated Waste Management Systems (IWMS) affects essentially
the sustainable concern of every region. The features of such systems
have great influence on all of the components of sustainability. In
order to reach the optimal way of processes, a comprehensive
mapping of the variables affecting the future efficiency of the system
is needed such as analysis of the interconnections among the
components and modeling of their interactions. The planning of a
IWMS is based fundamentally on technical and economical
opportunities and the legal framework. Modeling the sustainability
and operation effectiveness of a certain IWMS is not in the scope of
the present research. The complexity of the systems and the large
number of the variables require the utilization of a complex approach
to model the outcomes and future risks. This complex method should
be able to evaluate the logical framework of the factors composing
the system and the interconnections between them. The authors of
this paper studied the usability of the Fuzzy Cognitive Map (FCM)
approach modeling the future operation of IWMS’s. The approach
requires two input data set. One is the connection matrix containing
all the factors affecting the system in focus with all the
interconnections. The other input data set is the time series, a
retrospective reconstruction of the weights and roles of the factors.
This paper introduces a novel method to develop time series by
content analysis.
Abstract: The use of eXtensible Markup Language (XML) in
web, business and scientific databases lead to the development of
methods, techniques and systems to manage and analyze XML data.
Semi-structured documents suffer due to its heterogeneity and
dimensionality. XML structure and content mining represent
convergence for research in semi-structured data and text mining. As
the information available on the internet grows drastically, extracting
knowledge from XML documents becomes a harder task. Certainly,
documents are often so large that the data set returned as answer to a
query may also be very big to convey the required information. To
improve the query answering, a Semantic Tree Based Association
Rule (STAR) mining method is proposed. This method provides
intentional information by considering the structure, content and the
semantics of the content. The method is applied on Reuter’s dataset
and the results show that the proposed method outperforms well.
Abstract: Safety is one of the most important considerations
when buying a new car. While active safety aims at avoiding
accidents, passive safety systems such as airbags and seat belts
protect the occupant in case of an accident. In addition to legal
regulations, organizations like Euro NCAP provide consumers with
an independent assessment of the safety performance of cars and
drive the development of safety systems in automobile industry.
Those ratings are mainly based on injury assessment reference values
derived from physical parameters measured in dummies during a car
crash test.
The components and sub-systems of a safety system are designed
to achieve the required restraint performance. Sled tests and other
types of tests are then carried out by car makers and their suppliers
to confirm the protection level of the safety system. A Knowledge
Discovery in Databases (KDD) process is proposed in order to
minimize the number of tests. The KDD process is based on the
data emerging from sled tests according to Euro NCAP specifications.
About 30 parameters of the passive safety systems from different data
sources (crash data, dummy protocol) are first analysed together with
experts opinions. A procedure is proposed to manage missing data
and validated on real data sets. Finally, a procedure is developed to
estimate a set of rough initial parameters of the passive system before
testing aiming at reducing the number of tests.
Abstract: Structured Query Language (SQL) is the standard de facto language to access and manipulate data in a relational database. Although SQL is a language that is simple and powerful, most novice users will have trouble with SQL syntax. Thus, we are presenting SQL generator tool which is capable of translating actions and displaying SQL commands and data sets simultaneously. The tool was developed based on Model-View-Controller (MVC) pattern. The MVC pattern is a widely used software design pattern that enforces the separation between the input, processing, and output of an application. Developers take full advantage of it to reduce the complexity in architectural design and to increase flexibility and reuse of code. In addition, we use White-Box testing for the code verification in the Model module.