Automatic Detection and Classification of Microcalcification, Mass, Architectural Distortion and Bilateral Asymmetry in Digital Mammogram

Mammography has been one of the most reliable methods for early detection of breast cancer. There are different lesions which are breast cancer characteristic such as microcalcifications, masses, architectural distortions and bilateral asymmetry. One of the major challenges of analysing digital mammogram is how to extract efficient features from it for accurate cancer classification. In this paper we proposed a hybrid feature extraction method to detect and classify all four signs of breast cancer. The proposed method is based on multiscale surrounding region dependence method, Gabor filters, multi fractal analysis, directional and morphological analysis. The extracted features are input to self adaptive resource allocation network (SRAN) classifier for classification. The validity of our approach is extensively demonstrated using the two benchmark data sets Mammographic Image Analysis Society (MIAS) and Digital Database for Screening Mammograph (DDSM) and the results have been proved to be progressive.

Organization of the Purchasing Function for Innovation

Innovations not only contribute to competitiveness of the company but have also positive effects on revenues. On average, product innovations account to 14 percent of companies’ sales. Innovation management has substantially changed during the last decade, because of growing reliance on external partners. As a consequence, a new task for purchasing arises, as firms need to understand which suppliers actually do have high potential contributing to the innovativeness of the firm and which do not. Proper organization of the purchasing function is important since for the majority of manufacturing companies deal with substantial material costs which pass through the purchasing function. In the past the purchasing function was largely seen as a transaction-oriented, clerical function but today purchasing is the intermediate with supply chain partners contributing to innovations, be it product or process innovations. Therefore, purchasing function has to be organized differently to enable firm innovation potential. However, innovations are inherently risky. There are behavioral risk (that some partner will take advantage of the other party), technological risk in terms of complexity of products and processes of manufacturing and incoming materials and finally market risks, which in fact judge the value of the innovation. These risks are investigated in this work. Specifically, technological risks which deal with complexity of the products, and processes will be investigated more thoroughly. Buying components or such high edge technologies necessities careful investigation of technical features and therefore is usually conducted by a team of experts. Therefore it is hypothesized that higher the technological risk, higher will be the centralization of the purchasing function as an interface with other supply chain members. Main contribution of this research lies is in the fact that analysis was performed on a large data set of 1493 companies, from 25 countries collected in the GMRG 4 survey. Most analyses of purchasing function are done by case study analysis of innovative firms. Therefore this study contributes with empirical evaluations that can be generalized.

Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

The paper presents combined automatic speech recognition (ASR) of English and machine translation (MT) for English and Croatian and Croatian-English language pairs in the domain of business correspondence. The first part presents results of training the ASR commercial system on English data sets, enriched by error analysis. The second part presents results of machine translation performed by free online tool for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Comparative Analysis of Diverse Collection of Big Data Analytics Tools

Over the past era, there have been a lot of efforts and studies are carried out in growing proficient tools for performing various tasks in big data. Recently big data have gotten a lot of publicity for their good reasons. Due to the large and complex collection of datasets it is difficult to process on traditional data processing applications. This concern turns to be further mandatory for producing various tools in big data. Moreover, the main aim of big data analytics is to utilize the advanced analytic techniques besides very huge, different datasets which contain diverse sizes from terabytes to zettabytes and diverse types such as structured or unstructured and batch or streaming. Big data is useful for data sets where their size or type is away from the capability of traditional relational databases for capturing, managing and processing the data with low-latency. Thus the out coming challenges tend to the occurrence of powerful big data tools. In this survey, a various collection of big data tools are illustrated and also compared with the salient features.

A New Method to Estimate the Low Income Proportion: Monte Carlo Simulations

Estimation of a proportion has many applications in economics and social studies. A common application is the estimation of the low income proportion, which gives the proportion of people classified as poor into a population. In this paper, we present this poverty indicator and propose to use the logistic regression estimator for the problem of estimating the low income proportion. Various sampling designs are presented. Assuming a real data set obtained from the European Survey on Income and Living Conditions, Monte Carlo simulation studies are carried out to analyze the empirical performance of the logistic regression estimator under the various sampling designs considered in this paper. Results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the customary estimator under the various sampling designs considered in this paper. The stratified sampling design can also provide more accurate results.

The Effect of Outliers on the Economic and Social Survey on Income and Living Conditions

The European Union Survey on Income and Living Conditions (EU-SILC) is a popular survey which provides information on income, poverty, social exclusion and living conditions of households and individuals in the European Union. The EU-SILC contains variables which may contain outliers. The presence of outliers can have an impact on the measures and indicators used by the EU-SILC. In this paper, we used data sets from various countries to analyze the presence of outliers. In addition, we obtain some indicators after removing these outliers, and a comparison between both situations can be observed. Finally, some conclusions are obtained.

Frequent Itemset Mining Using Rough-Sets

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

General Regression Neural Network and Back Propagation Neural Network Modeling for Predicting Radial Overcut in EDM: A Comparative Study

This paper presents a comparative study between two neural network models namely General Regression Neural Network (GRNN) and Back Propagation Neural Network (BPNN) are used to estimate radial overcut produced during Electrical Discharge Machining (EDM). Four input parameters have been employed: discharge current (Ip), pulse on time (Ton), Duty fraction (Tau) and discharge voltage (V). Recently, artificial intelligence techniques, as it is emerged as an effective tool that could be used to replace time consuming procedures in various scientific or engineering applications, explicitly in prediction and estimation of the complex and nonlinear process. The both networks are trained, and the prediction results are tested with the unseen validation set of the experiment and analysed. It is found that the performance of both the networks are found to be in good agreement with average percentage error less than 11% and the correlation coefficient obtained for the validation data set for GRNN and BPNN is more than 91%. However, it is much faster to train GRNN network than a BPNN and GRNN is often more accurate than BPNN. GRNN requires more memory space to store the model, GRNN features fast learning that does not require an iterative procedure, and highly parallel structure. GRNN networks are slower than multilayer perceptron networks at classifying new cases.

Visualization of Quantitative Thresholds in Stocks

Technical analysis comprised by various technical indicators is a holistic way of representing price movement of stocks in the market. Various forms of indicators have evolved from the primitive ones in the past decades. There have been many attempts to introduce volume as a major determinant to determine strong patterns in market forecasting. The law of demand defines the relationship between the volume and price. Most of the traders are familiar with the volume game. Including the time dimension to the law of demand provides a different visualization to the theory. While attempting the same, it was found that there are different thresholds in the market for different companies. These thresholds have a significant influence on the price. This article is an attempt in determining the thresholds for companies using the three dimensional graphs for optimizing the portfolios. It also emphasizes on the magnitude of importance of volumes as a key factor for determining of predicting strong price movements, bullish and bearish markets. It uses a comprehensive data set of major companies which form a major chunk of the Indian automotive sector and are thus used as an illustration.

Empirical and Indian Automotive Equity Portfolio Decision Support

A brief review of the empirical studies on the methodology of the stock market decision support would indicate that they are at a threshold of validating the accuracy of the traditional and the fuzzy, artificial neural network and the decision trees. Many researchers have been attempting to compare these models using various data sets worldwide. However, the research community is on the way to the conclusive confidence in the emerged models. This paper attempts to use the automotive sector stock prices from National Stock Exchange (NSE), India and analyze them for the intra-sectorial support for stock market decisions. The study identifies the significant variables and their lags which affect the price of the stocks using OLS analysis and decision tree classifiers.

Methods for Distinction of Cattle Using Supervised Learning

Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.

On Estimating the Headcount Index by Using the Logistic Regression Estimator

The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.

Diagnosis of the Heart Rhythm Disorders by Using Hybrid Classifiers

In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other. As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.

Software Effort Estimation Models Using Radial Basis Function Network

Software Effort Estimation is the process of estimating the effort required to develop software. By estimating the effort, the cost and schedule required to estimate the software can be determined. Accurate Estimate helps the developer to allocate the resource accordingly in order to avoid cost overrun and schedule overrun. Several methods are available in order to estimate the effort among which soft computing based method plays a prominent role. Software cost estimation deals with lot of uncertainty among all soft computing methods neural network is good in handling uncertainty. In this paper Radial Basis Function Network is compared with the back propagation network and the results are validated using six data sets and it is found that RBFN is best suitable to estimate the effort. The Results are validated using two tests the error test and the statistical test.

An Estimation of Rice Output Supply Response in Sierra Leone: A Nerlovian Model Approach

Rice grain is Sierra Leone’s staple food and the nation imports over 120,000 metric tons annually due to a shortfall in its cultivation. Thus, the insufficient level of the crop's cultivation in Sierra Leone is caused by many problems and this led to the endlessly widening supply and demand for the crop within the country. Consequently, this has instigated the government to spend huge money on the importation of this grain that would have been otherwise cultivated domestically at a cheaper cost. Hence, this research attempts to explore the response of rice supply with respect to its demand in Sierra Leone within the period 1980-2010. The Nerlovian adjustment model to the Sierra Leone rice data set within the period 1980-2010 was used. The estimated trend equations revealed that time had significant effect on output, productivity (yield) and area (acreage) of rice grain within the period 1980-2010 and this occurred generally at the 1% level of significance. The results showed that, almost the entire growth in output had the tendency to increase in the area cultivated to the crop. The time trend variable that was included for government policy intervention showed an insignificant effect on all the variables considered in this research. Therefore, both the short-run and long-run price response was inelastic since all their values were less than one. From the findings above, immediate actions that will lead to productivity growth in rice cultivation are required. To achieve the above, the responsible agencies should provide extension service schemes to farmers as well as motivating them on the adoption of modern rice varieties and technology in their rice cultivation ventures.

An Expert System Designed to Be Used with MOEAs for Efficient Portfolio Selection

This study presents an Expert System specially designed to be used with Multiobjective Evolutionary Algorithms (MOEAs) for the solution of the portfolio selection problem. The validation of the proposed hybrid System is done by using data sets from Hang Seng 31 in Hong Kong, DAX 100 in Germany and FTSE 100 in UK. The performance of the proposed system is assessed in comparison with the Non-dominated Sorting Genetic Algorithm II (NSGAII). The evaluation of the performance is based on different performance metrics that evaluate both the proximity of the solutions to the Pareto front and their dispersion on it. The results show that the proposed hybrid system is efficient for the solution of this kind of problems.

REDUCER – An Architectural Design Pattern for Reducing Large and Noisy Data Sets

To relieve the burden of reasoning on a point to point basis, in many domains there is a need to reduce large and noisy data sets into trends for qualitative reasoning. In this paper we propose and describe a new architectural design pattern called REDUCER for reducing large and noisy data sets that can be tailored for particular situations. REDUCER consists of 2 consecutive processes: Filter which takes the original data and removes outliers, inconsistencies or noise; and Compression which takes the filtered data and derives trends in the data. In this seminal article we also show how REDUCER has successfully been applied to 3 different case studies.

Variability of Metal Composition and Concentrations in Road Dust in the Urban Environment

Urban road dust comprises of a range of potentially  toxic metal elements and plays a critical role in degrading urban  receiving water quality. Hence, assessing the metal composition and  concentration in urban road dust is a high priority. This study  investigated the variability of metal composition and concentrations  in road dust in 4 different urban land uses in Gold Coast, Australia.  Samples from 16 road sites were collected and tested for selected 12  metal species. The data set was analyzed using both univariate and  multivariate techniques. Outcomes of the data analysis revealed that  the metal concentrations inroad dust differs considerably within and  between different land uses. Iron, aluminum, magnesium and zinc are  the most abundant in urban land uses. It was also noted that metal  species such as titanium, nickel, copper and zinc have the highest  concentrations in industrial land use. The study outcomes revealed  that soil and traffic related sources as key sources of metals deposited  on road surfaces.  

Statistical Analysis for Overdispersed Medical Count Data

Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling overdispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling overdispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling overdispered medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling overdispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling overdispersed medical count data when ZIP and ZINB are inadequate.

Validation of SWAT Model for Prediction of Water Yield and Water Balance: Case Study of Upstream Catchment of Jebba Dam in Nigeria

Estimation of water yield and water balance in a river catchment is critical to the sustainable management of water resources at watershed level in any country. Therefore, in the present study, Soil and Water Assessment Tool (SWAT) interfaced with Geographical Information System (GIS) was applied as a tool to predict water balance and water yield of a catchment area in Nigeria. The catchment area, which was 12,992km2, is located upstream Jebba hydropower dam in North central part of Nigeria. In this study, data on the observed flow were collected and compared with simulated flow using SWAT. The correlation between the two data sets was evaluated using statistical measures, such as, Nasch-Sucliffe Efficiency (NSE) and coefficient of determination (R2). The model output shows a good agreement between the observed flow and simulated flow as indicated by NSE and R2, which were greater than 0.7 for both calibration and validation period. A total of 42,733 mm of water was predicted by the calibrated model as the water yield potential of the basin for a simulation period between 1985 to 2010. This interesting performance obtained with SWAT model suggests that SWAT model could be a promising tool to predict water balance and water yield in sustainable management of water resources. In addition, SWAT could be applied to other water resources in other basins in Nigeria as a decision support tool for sustainable water management in Nigeria.