Abstract: A critical problem in wireless sensor networks is limited battery and memory of nodes. Therefore, each node in the network could maintain only a subset of its neighbors to communicate with. This will increase the battery usage in the network because each packet should take more hops to reach its destination. In order to tackle these problems, spanner graphs are defined. Since each node has a small degree in a spanner graph and the distance in the graph is not much greater than its actual geographical distance, spanner graphs are suitable candidates to be used for the topology of a wireless sensor network. In this paper, we study Yao graphs and their behavior for a randomly selected set of points. We generate several random point sets and compare the properties of their Yao graphs with the complete graph. Based on our data sets, we obtain several charts demonstrating how Yao graphs behave for a set of randomly chosen point set. As the results show, the stretch factor of a Yao graph follows a normal distribution. Furthermore, the stretch factor is in average far less than the worst case stretch factor proved for Yao graphs in previous results. Furthermore, we use Yao graph for a realistic point set and study its stretch factor in real world.
Abstract: Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.
Abstract: In this study, a predictive model for estimating the metabolism (MET) of human body was developed for the optimal control of indoor thermal environment. Human body images for indoor activities and human body joint coordinated values were collected as data sets, which are used in predictive model. A deep learning algorithm was used in an initial model, and its number of hidden layers and hidden neurons were optimized. Lastly, the model prediction performance was analyzed after the model being trained through collected data. In conclusion, the possibility of MET prediction was confirmed, and the direction of the future study was proposed as developing various data and the predictive model.
Abstract: Electricity prices have sophisticated features such as
high volatility, nonlinearity and high frequency that make forecasting
quite difficult. Electricity price has a volatile and non-random
character so that, it is possible to identify the patterns based on the
historical data. Intelligent decision-making requires accurate price
forecasting for market traders, retailers, and generation companies.
So far, many shallow-ANN (artificial neural networks) models have
been published in the literature and showed adequate forecasting
results. During the last years, neural networks with many hidden
layers, which are referred to as DNN (deep neural networks) have
been using in the machine learning community. The goal of this
study is to investigate electricity price forecasting performance of the
shallow-ANN and DNN models for the Turkish day-ahead electricity
market. The forecasting accuracy of the models has been evaluated
with publicly available data from the Turkish day-ahead electricity
market. Both shallow-ANN and DNN approach would give successful
result in forecasting problems. Historical load, price and weather
temperature data are used as the input variables for the models.
The data set includes power consumption measurements gathered
between January 2016 and December 2017 with one-hour resolution.
In this regard, forecasting studies have been carried out comparatively
with shallow-ANN and DNN models for Turkish electricity markets
in the related time period. The main contribution of this study
is the investigation of different shallow-ANN and DNN models
in the field of electricity price forecast. All models are compared
regarding their MAE (Mean Absolute Error) and MSE (Mean Square)
results. DNN models give better forecasting performance compare to
shallow-ANN. Best five MAE results for DNN models are 0.346,
0.372, 0.392, 0,402 and 0.409.
Abstract: Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.
Abstract: Most movie recommendation systems have been developed for customers to find items of interest. This work introduces a predictive model usable by small and medium-sized enterprises (SMEs) who are in need of a data-based and analytical approach to stock proper movies for local audiences and retain more customers. We used classification models to extract features from thousands of customers’ demographic, behavioral and social information to predict their movie genre preference. In the implementation, a Gaussian kernel support vector machine (SVM) classification model and a logistic regression model were established to extract features from sample data and their test error-in-sample were compared. Comparison of error-out-sample was also made under different Vapnik–Chervonenkis (VC) dimensions in the machine learning algorithm to find and prevent overfitting. Gaussian kernel SVM prediction model can correctly predict movie genre preferences in 85% of positive cases. The accuracy of the algorithm increased to 93% with a smaller VC dimension and less overfitting. These findings advance our understanding of how to use machine learning approach to predict customers’ preferences with a small data set and design prediction tools for these enterprises.
Abstract: In today's popularity and progress of information technology, the big data set and its analysis are no longer a major conundrum. Now, we could not only use the relevant big data to analysis and emulate the possible status of urban development in the near future, but also provide more comprehensive and reasonable policy implementation basis for government units or decision-makers via the analysis and emulation results as mentioned above. In this research, we set Taipei City as the research scope, and use the relevant big data variables (e.g., population, facility utilization and related social policy ratings) and Analytic Network Process (ANP) approach to implement in-depth research and discussion for the possible reduction of land use in primary and secondary schools of Taipei City. In addition to enhance the prosperous urban activities for the urban public facility utilization, the final results of this research could help improve the efficiency of urban land use in the future. Furthermore, the assessment model and research framework established in this research also provide a good reference for schools or other public facilities land use and adaptive reuse strategies in the future.
Abstract: The Intelligent transportation system is essential to build smarter cities. Machine learning based transportation prediction could be highly promising approach by delivering invisible aspect visible. In this context, this research aims to make a prototype model that predicts transportation network by using big data and machine learning technology. In detail, among urban transportation systems this research chooses bus system. The research problem that existing headway model cannot response dynamic transportation conditions. Thus, bus delay problem is often occurred. To overcome this problem, a prediction model is presented to fine patterns of bus delay by using a machine learning implementing the following data sets; traffics, weathers, and bus statues. This research presents a flexible headway model to predict bus delay and analyze the result. The prototyping model is composed by real-time data of buses. The data are gathered through public data portals and real time Application Program Interface (API) by the government. These data are fundamental resources to organize interval pattern models of bus operations as traffic environment factors (road speeds, station conditions, weathers, and bus information of operating in real-time). The prototyping model is designed by the machine learning tool (RapidMiner Studio) and conducted tests for bus delays prediction. This research presents experiments to increase prediction accuracy for bus headway by analyzing the urban big data. The big data analysis is important to predict the future and to find correlations by processing huge amount of data. Therefore, based on the analysis method, this research represents an effective use of the machine learning and urban big data to understand urban dynamics.
Abstract: In seismic data processing, attenuation of random noise
is the basic step to improve quality of data for further application
of seismic data in exploration and development in different gas
and oil industries. The signal-to-noise ratio of the data also highly
determines quality of seismic data. This factor affects the reliability
as well as the accuracy of seismic signal during interpretation
for different purposes in different companies. To use seismic data
for further application and interpretation, we need to improve the
signal-to-noise ration while attenuating random noise effectively.
To improve the signal-to-noise ration and attenuating seismic
random noise by preserving important features and information
about seismic signals, we introduce the concept of anisotropic
total fractional order denoising algorithm. The anisotropic total
fractional order variation model defined in fractional order bounded
variation is proposed as a regularization in seismic denoising. The
split Bregman algorithm is employed to solve the minimization
problem of the anisotropic total fractional order variation model
and the corresponding denoising algorithm for the proposed method
is derived. We test the effectiveness of theproposed method for
synthetic and real seismic data sets and the denoised result is
compared with F-X deconvolution and non-local means denoising
algorithm.
Abstract: This paper investigates successful sub-bands of wave atom transform via classification of mammograms, when the coefficients of sub-bands are used as features. A computer-aided diagnosis system is constructed by using wave atom transform, support vector machine and k-nearest neighbor classifiers. Two-class classification is studied in detail using two data sets, separately. The successful sub-bands are determined according to the accuracy rates, coefficient numbers, and sensitivity rates.
Abstract: In recent years, object detection has gained much
attention and very encouraging research area in the field of computer
vision. The robust object boundaries detection in an image is
demanded in numerous applications of human computer interaction
and automated surveillance systems. Many methods and approaches
have been developed for automatic object detection in various fields,
such as automotive, quality control management and environmental
services. Inappropriately, to the best of our knowledge, object
detection under illumination with shadow consideration has not
been well solved yet. Furthermore, this problem is also one of
the major hurdles to keeping an object detection method from the
practical applications. This paper presents an approach to automatic
object detection in images under non-standardized environmental
conditions. A key challenge is how to detect the object, particularly
under uneven illumination conditions. Image capturing conditions
the algorithms need to consider a variety of possible environmental
factors as the colour information, lightening and shadows varies
from image to image. Existing methods mostly failed to produce the
appropriate result due to variation in colour information, lightening
effects, threshold specifications, histogram dependencies and colour
ranges. To overcome these limitations we propose an object detection
algorithm, with pre-processing methods, to reduce the interference
caused by shadow and illumination effects without fixed parameters.
We use the Y CrCb colour model without any specific colour
ranges and predefined threshold values. The segmented object regions
are further classified using morphological operations (Erosion and
Dilation) and contours. Proposed approach applied on a large image
data set acquired under various environmental conditions for wood
stack detection. Experiments show the promising result of the
proposed approach in comparison with existing methods.
Abstract: The volume of overburden and coal excavations in open-pit mine is generally determined by conventional survey such as total station. This study aimed to evaluate the accuracy of terrestrial laser scanner (TLS) used to measure overburden and coal excavations, and to compare TLS survey data sets with the data of the total station. Results revealed that, the reference points measured with the total station showed 0.2 mm precision for both horizontal and vertical coordinates. When using TLS on the same points, the standard deviations of 4.93 cm and 0.53 cm for horizontal and vertical coordinates, respectively, were achieved. For volume measurements covering the mining areas of 79,844 m2, TLS yielded the mean difference of about 1% and the surface error margin of 6 cm at the 95% confidence level when compared to the volume obtained by total station.
Abstract: Tracking of moving people has gained a matter of great importance due to rapid technological advancements in the field of computer vision. The objective of this study is to design a motion based detection and tracking multiple walking pedestrians randomly in different directions. In our proposed method, Gaussian mixture model (GMM) is used to determine moving persons in image sequences. It reacts to changes that take place in the scene like different illumination; moving objects start and stop often, etc. Background noise in the scene is eliminated through applying morphological operations and the motions of tracked people which is determined by using the Kalman filter. The Kalman filter is applied to predict the tracked location in each frame and to determine the likelihood of each detection. We used a benchmark data set for the evaluation based on a side wall stationary camera. The actual scenes from the data set are taken on a street including up to eight people in front of the camera in different two scenes, the duration is 53 and 35 seconds, respectively. In the case of walking pedestrians in close proximity, the proposed method has achieved the detection ratio of 87%, and the tracking ratio is 77 % successfully. When they are deferred from each other, the detection ratio is increased to 90% and the tracking ratio is also increased to 79%.
Abstract: Software maintenance phase is started once a software project has been developed and delivered. After that, any modification to it corresponds to maintenance. Software maintenance involves modifications to keep a software project usable in a changed or a changing environment, to correct discovered faults, and modifications, and to improve performance or maintainability. Software maintenance and management of software maintenance are recognized as two most important and most expensive processes in a life of a software product. This research is basing the prediction of maintenance, on risks and time evaluation, and using them as data sets for working with neural networks. The aim of this paper is to provide support to project maintenance managers. They will be able to pass the issues planned for the next software-service-patch to the experts, for risk and working time evaluation, and afterward to put all data to neural networks in order to get software maintenance prediction. This process will lead to the more accurate prediction of the working hours needed for the software-service-patch, which will eventually lead to better planning of budget for the software maintenance projects.
Abstract: Global Competitiveness Index (GCI) prepared by World Economic Forum has become a benchmark in studying the competitiveness of countries and for understanding the factors that enable competitiveness. Innovation is a key pillar in competitiveness and has the unique property of enabling exponential economic growth. This paper attempts to analyze how the pillars comprising the Global Competitiveness Index affect innovation and whether GDP growth can directly affect innovation outcomes for a country. The key objective of the study is to identify areas on which governments of developing countries can focus policies and programs to improve their country’s innovativeness. We have compiled a panel data set for top innovating countries and large emerging economies called BRICS from 2007-08 to 2014-15 in order to find the significant factors that affect innovation. The results of the regression analysis suggest that government should make policies to improve labor market efficiency, establish sophisticated business networks, provide basic health and primary education to its people and strengthen the quality of higher education and training services in the economy. The achievements of smaller economies on innovation suggest that concerted efforts by governments can counter any size related disadvantage, and in fact can provide greater flexibility and speed in encouraging innovation.
Abstract: Instance selection (IS) technique is used to reduce
the data size to improve the performance of data mining methods.
Recently, to process very large data set, several proposed methods
divide the training set into some disjoint subsets and apply IS
algorithms independently to each subset. In this paper, we analyze
the limitation of these methods and give our viewpoint about how to
divide and conquer in IS procedure. Then, based on fast condensed
nearest neighbor (FCNN) rule, we propose a large data sets instance
selection method with MapReduce framework. Besides ensuring the
prediction accuracy and reduction rate, it has two desirable properties:
First, it reduces the work load in the aggregation node; Second
and most important, it produces the same result with the sequential
version, which other parallel methods cannot achieve. We evaluate the
performance of FCNN-MR on one small data set and two large data
sets. The experimental results show that it is effective and practical.
Abstract: This study investigates the impact of public healthcare facilities and socio-economic circumstances on the status of child health in Pakistan. The complete analysis is carried out in correspondence with fourth and sixth millennium development goals. Further, the health variables chosen are also inherited from targeted indicators of the mentioned goals (MDGs). Trends in the Human Opportunity Index (HOI) for both health inequalities and coverage are analyzed using the Pakistan Social and Living Standards Measurement (PLSM) data set for 2001-02 to 2012-13 at the national and provincial level. To reveal the relative importance of each circumstance in achieving the targeted values for child health, Shorrocks decomposition is applied on HOI. The annual point average growth rate of HOI is used to simulate the time period for the achievement of target set by MDGs and universal access also. The results indicate an improvement in HOI for a reduction in child mortality rates from 52.1% in 2001-02 to 67.3% in 2012-13, which confirms the availability of healthcare opportunities to a larger segment of society. Similarly, immunization against measles and other diseases such as Diphtheria, Polio, Bacillus Calmette-Guerin (BCG), and Hepatitis has also registered an improvement from 51.6% to 69.9% during the period of study at the national level. On a positive note, no gender disparity has been found for child health indicators and that health outcome is mostly affected by the parental and geographical features and availability of health infrastructure. However, the study finds that this achievement has been uneven across provinces. Pakistan is not only lagging behind in achieving its health goals, disappointingly with the current rate of health care provision, but it will take many additional years to achieve its targets.
Abstract: The increasing amount of collected data has limited the performance of the current analyzing algorithms. Thus, developing new cost-effective algorithms in terms of complexity, scalability, and accuracy raised significant interests. In this paper, a modified effective k-means based algorithm is developed and experimented. The new algorithm aims to reduce the computational load without significantly affecting the quality of the clusterings. The algorithm uses the City Block distance and a new stop criterion to guarantee the convergence. Conducted experiments on a real data set show its high performance when compared with the original k-means version.
Abstract: Coastal regions are the one of the most commonly used places by the natural balance and the growing population. In coastal engineering, the most valuable data is wave behaviors. The amount of this data becomes very big because of observations that take place for periods of hours, days and months. In this study, some statistical methods such as the wave spectrum analysis methods and the standard statistical methods have been used. The goal of this study is the discovery profiles of the different coast areas by using these statistical methods, and thus, obtaining an instance based data set from the big data to analysis by using data mining algorithms. In the experimental studies, the six sample data sets about the wave behaviors obtained by 20 minutes of observations from Mersin Bay in Turkey and converted to an instance based form, while different clustering techniques in data mining algorithms were used to discover similar coastal places. Moreover, this study discusses that this summarization approach can be used in other branches collecting big data such as medicine.
Abstract: Hydroclimatic observation values are used in the planning of the project of water resources. Climate variables are the first of the values used in planning projects. At the same time, the climate system is a complex and interactive system involving the atmosphere, land surfaces, snow and bubbles, the oceans and other water structures. The amount and distribution of precipitation, which is an important climate parameter, is a limiting environmental factor for dispersed living things. Trend analysis is applied to the detection of the presence of a pattern or trend in the data set. Many trends work in different parts of the world are usually made for the determination of climate change. The detection and attribution of past trends and variability in climatic variables is essential for explaining potential future alteration resulting from anthropogenic activities. Parametric and non-parametric tests are used for determining the trends in climatic variables. In this study, trend tests were applied to annual total precipitation data obtained in period of 1972 and 2012, in the Konya Basin. Non-parametric trend tests, (Sen’s T, Spearman’s Rho, Mann-Kendal, Sen’s T trend, Wald-Wolfowitz) and parametric test (mean square) were applied to annual total precipitations of 15 stations for trend analysis. The linear slopes (change per unit time) of trends are calculated by using a non-parametric estimator developed by Sen. The beginning of trends is determined by using the Mann-Kendall rank correlation test. In addition, homogeneities in precipitation trends are tested by using a method developed by Van Belle and Hughes. As a result of tests, negative linear slopes were found in annual total precipitations in Konya.