Abstract: Mammography has been one of the most reliable
methods for early detection of breast cancer. There are different
lesions which are breast cancer characteristic such as
microcalcifications, masses, architectural distortions and bilateral
asymmetry. One of the major challenges of analysing digital
mammogram is how to extract efficient features from it for accurate
cancer classification. In this paper we proposed a hybrid feature
extraction method to detect and classify all four signs of breast
cancer. The proposed method is based on multiscale surrounding
region dependence method, Gabor filters, multi fractal analysis,
directional and morphological analysis. The extracted features are
input to self adaptive resource allocation network (SRAN) classifier
for classification. The validity of our approach is extensively
demonstrated using the two benchmark data sets Mammographic
Image Analysis Society (MIAS) and Digital Database for Screening
Mammograph (DDSM) and the results have been proved to be
progressive.
Abstract: Innovations not only contribute to competitiveness of
the company but have also positive effects on revenues. On average,
product innovations account to 14 percent of companies’ sales.
Innovation management has substantially changed during the last
decade, because of growing reliance on external partners. As a
consequence, a new task for purchasing arises, as firms need to
understand which suppliers actually do have high potential
contributing to the innovativeness of the firm and which do not.
Proper organization of the purchasing function is important since
for the majority of manufacturing companies deal with substantial
material costs which pass through the purchasing function. In the past
the purchasing function was largely seen as a transaction-oriented,
clerical function but today purchasing is the intermediate with supply
chain partners contributing to innovations, be it product or process
innovations. Therefore, purchasing function has to be organized
differently to enable firm innovation potential.
However, innovations are inherently risky. There are behavioral
risk (that some partner will take advantage of the other party),
technological risk in terms of complexity of products and processes
of manufacturing and incoming materials and finally market risks,
which in fact judge the value of the innovation. These risks are
investigated in this work. Specifically, technological risks which deal
with complexity of the products, and processes will be investigated
more thoroughly. Buying components or such high edge technologies
necessities careful investigation of technical features and therefore is
usually conducted by a team of experts. Therefore it is hypothesized
that higher the technological risk, higher will be the centralization of
the purchasing function as an interface with other supply chain
members.
Main contribution of this research lies is in the fact that analysis
was performed on a large data set of 1493 companies, from 25
countries collected in the GMRG 4 survey. Most analyses of
purchasing function are done by case study analysis of innovative
firms. Therefore this study contributes with empirical evaluations that
can be generalized.
Abstract: The paper presents combined automatic speech
recognition (ASR) of English and machine translation (MT) for
English and Croatian and Croatian-English language pairs in the
domain of business correspondence. The first part presents results of
training the ASR commercial system on English data sets, enriched
by error analysis. The second part presents results of machine
translation performed by free online tool for English and Croatian
and Croatian-English language pairs. Human evaluation in terms of
usability is conducted and internal consistency calculated by
Cronbach's alpha coefficient, enriched by error analysis. Automatic
evaluation is performed by WER (Word Error Rate) and PER
(Position-independent word Error Rate) metrics, followed by
investigation of Pearson’s correlation with human evaluation.
Abstract: Over the past era, there have been a lot of efforts and
studies are carried out in growing proficient tools for performing
various tasks in big data. Recently big data have gotten a lot of
publicity for their good reasons. Due to the large and complex
collection of datasets it is difficult to process on traditional data
processing applications. This concern turns to be further mandatory
for producing various tools in big data. Moreover, the main aim of
big data analytics is to utilize the advanced analytic techniques
besides very huge, different datasets which contain diverse sizes from
terabytes to zettabytes and diverse types such as structured or
unstructured and batch or streaming. Big data is useful for data sets
where their size or type is away from the capability of traditional
relational databases for capturing, managing and processing the data
with low-latency. Thus the out coming challenges tend to the
occurrence of powerful big data tools. In this survey, a various
collection of big data tools are illustrated and also compared with the
salient features.
Abstract: Estimation of a proportion has many applications in
economics and social studies. A common application is the estimation
of the low income proportion, which gives the proportion of people
classified as poor into a population. In this paper, we present this
poverty indicator and propose to use the logistic regression estimator
for the problem of estimating the low income proportion. Various
sampling designs are presented. Assuming a real data set obtained
from the European Survey on Income and Living Conditions, Monte
Carlo simulation studies are carried out to analyze the empirical
performance of the logistic regression estimator under the various
sampling designs considered in this paper. Results derived from
Monte Carlo simulation studies indicate that the logistic regression
estimator can be more accurate than the customary estimator under
the various sampling designs considered in this paper. The stratified
sampling design can also provide more accurate results.
Abstract: The European Union Survey on Income and Living
Conditions (EU-SILC) is a popular survey which provides
information on income, poverty, social exclusion and living
conditions of households and individuals in the European Union.
The EU-SILC contains variables which may contain outliers. The
presence of outliers can have an impact on the measures and
indicators used by the EU-SILC. In this paper, we used data sets
from various countries to analyze the presence of outliers. In addition,
we obtain some indicators after removing these outliers, and a
comparison between both situations can be observed. Finally, some
conclusions are obtained.
Abstract: Frequent pattern mining is the process of finding a
pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set. It was proposed in the context of frequent
itemsets and association rule mining. Frequent pattern mining is used
to find inherent regularities in data. What products were often
purchased together? Its applications include basket data analysis,
cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis. However, one of
the bottlenecks of frequent itemset mining is that as the data increase
the amount of time and resources required to mining the data
increases at an exponential rate. In this investigation a new algorithm
is proposed which can be uses as a pre-processor for frequent itemset
mining. FASTER (FeAture SelecTion using Entropy and Rough sets)
is a hybrid pre-processor algorithm which utilizes entropy and roughsets
to carry out record reduction and feature (attribute) selection
respectively. FASTER for frequent itemset mining can produce a
speed up of 3.1 times when compared to original algorithm while
maintaining an accuracy of 71%.
Abstract: This paper presents a comparative study between two
neural network models namely General Regression Neural Network
(GRNN) and Back Propagation Neural Network (BPNN) are used
to estimate radial overcut produced during Electrical Discharge
Machining (EDM). Four input parameters have been employed:
discharge current (Ip), pulse on time (Ton), Duty fraction (Tau) and
discharge voltage (V). Recently, artificial intelligence techniques, as
it is emerged as an effective tool that could be used to replace
time consuming procedures in various scientific or engineering
applications, explicitly in prediction and estimation of the complex
and nonlinear process. The both networks are trained, and the
prediction results are tested with the unseen validation set of the
experiment and analysed. It is found that the performance of both the
networks are found to be in good agreement with average percentage
error less than 11% and the correlation coefficient obtained for the
validation data set for GRNN and BPNN is more than 91%. However,
it is much faster to train GRNN network than a BPNN and GRNN is
often more accurate than BPNN. GRNN requires more memory space
to store the model, GRNN features fast learning that does not require
an iterative procedure, and highly parallel structure. GRNN networks
are slower than multilayer perceptron networks at classifying new
cases.
Abstract: Technical analysis comprised by various technical indicators is a holistic way of representing price movement of stocks in the market. Various forms of indicators have evolved from the primitive ones in the past decades. There have been many attempts to introduce volume as a major determinant to determine strong patterns in market forecasting. The law of demand defines the relationship between the volume and price. Most of the traders are familiar with the volume game. Including the time dimension to the law of demand provides a different visualization to the theory. While attempting the same, it was found that there are different thresholds in the market for different companies. These thresholds have a significant influence on the price. This article is an attempt in determining the thresholds for companies using the three dimensional graphs for optimizing the portfolios. It also emphasizes on the magnitude of importance of volumes as a key factor for determining of predicting strong price movements, bullish and bearish markets. It uses a comprehensive data set of major companies which form a major chunk of the Indian automotive sector and are thus used as an illustration.
Abstract: A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.
Abstract: Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Abstract: The problem of estimating a proportion has important
applications in the field of economics, and in general, in many areas
such as social sciences. A common application in economics is
the estimation of the headcount index. In this paper, we define the
general headcount index as a proportion. Furthermore, we introduce
a new quantitative method for estimating the headcount index. In
particular, we suggest to use the logistic regression estimator for the
problem of estimating the headcount index. Assuming a real data set,
results derived from Monte Carlo simulation studies indicate that the
logistic regression estimator can be more accurate than the traditional
estimator of the headcount index.
Abstract: In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other.
As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.
Abstract: Software Effort Estimation is the process of estimating the effort required to develop software. By estimating the effort, the cost and schedule required to estimate the software can be determined. Accurate Estimate helps the developer to allocate the resource accordingly in order to avoid cost overrun and schedule overrun. Several methods are available in order to estimate the effort among which soft computing based method plays a prominent role. Software cost estimation deals with lot of uncertainty among all soft computing methods neural network is good in handling uncertainty. In this paper Radial Basis Function Network is compared with the back propagation network and the results are validated using six data sets and it is found that RBFN is best suitable to estimate the effort. The Results are validated using two tests the error test and the statistical test.
Abstract: Rice grain is Sierra Leone’s staple food and the nation
imports over 120,000 metric tons annually due to a shortfall in its
cultivation. Thus, the insufficient level of the crop's cultivation in
Sierra Leone is caused by many problems and this led to the
endlessly widening supply and demand for the crop within the
country. Consequently, this has instigated the government to spend
huge money on the importation of this grain that would have been
otherwise cultivated domestically at a cheaper cost. Hence, this
research attempts to explore the response of rice supply with respect
to its demand in Sierra Leone within the period 1980-2010.
The Nerlovian adjustment model to the Sierra Leone rice data set
within the period 1980-2010 was used. The estimated trend equations
revealed that time had significant effect on output, productivity
(yield) and area (acreage) of rice grain within the period 1980-2010
and this occurred generally at the 1% level of significance. The
results showed that, almost the entire growth in output had the
tendency to increase in the area cultivated to the crop. The time trend
variable that was included for government policy intervention
showed an insignificant effect on all the variables considered in this
research. Therefore, both the short-run and long-run price response
was inelastic since all their values were less than one.
From the findings above, immediate actions that will lead to
productivity growth in rice cultivation are required.
To achieve the above, the responsible agencies should provide
extension service schemes to farmers as well as motivating them on
the adoption of modern rice varieties and technology in their rice
cultivation ventures.
Abstract: This study presents an Expert System specially designed to be used with Multiobjective Evolutionary Algorithms (MOEAs) for the solution of the portfolio selection problem. The validation of the proposed hybrid System is done by using data sets from Hang Seng 31 in Hong Kong, DAX 100 in Germany and FTSE 100 in UK. The performance of the proposed system is assessed in comparison with the Non-dominated Sorting Genetic Algorithm II (NSGAII). The evaluation of the performance is based on different performance metrics that evaluate both the proximity of the solutions to the Pareto front and their dispersion on it. The results show that the proposed hybrid system is efficient for the solution of this kind of problems.
Abstract: To relieve the burden of reasoning on a point to point basis, in many domains there is a need to reduce large and noisy data sets into trends for qualitative reasoning. In this paper we propose and describe a new architectural design pattern called REDUCER for reducing large and noisy data sets that can be tailored for particular situations. REDUCER consists of 2 consecutive processes: Filter which takes the original data and removes outliers, inconsistencies or noise; and Compression which takes the filtered data and derives trends in the data. In this seminal article we also show how REDUCER has successfully been applied to 3 different case studies.
Abstract: Urban road dust comprises of a range of potentially
toxic metal elements and plays a critical role in degrading urban
receiving water quality. Hence, assessing the metal composition and
concentration in urban road dust is a high priority. This study
investigated the variability of metal composition and concentrations
in road dust in 4 different urban land uses in Gold Coast, Australia.
Samples from 16 road sites were collected and tested for selected 12
metal species. The data set was analyzed using both univariate and
multivariate techniques. Outcomes of the data analysis revealed that
the metal concentrations inroad dust differs considerably within and
between different land uses. Iron, aluminum, magnesium and zinc are
the most abundant in urban land uses. It was also noted that metal
species such as titanium, nickel, copper and zinc have the highest
concentrations in industrial land use. The study outcomes revealed
that soil and traffic related sources as key sources of metals deposited
on road surfaces.
Abstract: Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.
Abstract: Estimation of water yield and water balance in a river catchment is critical to the sustainable management of water resources at watershed level in any country. Therefore, in the present study, Soil and Water Assessment Tool (SWAT) interfaced with Geographical Information System (GIS) was applied as a tool to predict water balance and water yield of a catchment area in Nigeria. The catchment area, which was 12,992km2, is located upstream Jebba hydropower dam in North central part of Nigeria. In this study, data on the observed flow were collected and compared with simulated flow using SWAT. The correlation between the two data sets was evaluated using statistical measures, such as, Nasch-Sucliffe Efficiency (NSE) and coefficient of determination (R2). The model output shows a good agreement between the observed flow and simulated flow as indicated by NSE and R2, which were greater than 0.7 for both calibration and validation period. A total of 42,733 mm of water was predicted by the calibrated model as the water yield potential of the basin for a simulation period between 1985 to 2010. This interesting performance obtained with SWAT model suggests that SWAT model could be a promising tool to predict water balance and water yield in sustainable management of water resources. In addition, SWAT could be applied to other water resources in other basins in Nigeria as a decision support tool for sustainable water management in Nigeria.