Urban Big Data: An Experimental Approach to Building-Value Estimation Using Web-Based Data

Current real-estate value estimation, difficult for laymen, usually is performed by specialists. This paper presents an automated estimation process based on big data and machine-learning technology that calculates influences of building conditions on real-estate price measurement. The present study analyzed actual building sales sample data for Nonhyeon-dong, Gangnam-gu, Seoul, Korea, measuring the major influencing factors among the various building conditions. Further to that analysis, a prediction model was established and applied using RapidMiner Studio, a graphical user interface (GUI)-based tool for derivation of machine-learning prototypes. The prediction model is formulated by reference to previous examples. When new examples are applied, it analyses and predicts accordingly. The analysis process discerns the crucial factors effecting price increases by calculation of weighted values. The model was verified, and its accuracy determined, by comparing its predicted values with actual price increases.

Effect of Atmospheric Pressure on the Flow at the Outlet of a Propellant Nozzle

The purpose of this work is to simulate the flow at the exit of Vulcan 1 engine of European launcher Ariane 5. The geometry of the propellant nozzle is already determined using the characteristics method. The pressure in the outlet section of the nozzle is less than atmospheric pressure on the ground, causing the existence of oblique and normal shock waves at the exit. During the rise of the launcher, the atmospheric pressure decreases and the shock wave disappears. The code allows the capture of shock wave at exit of nozzle. The numerical technique uses the Flux Vector Splitting method of Van Leer to ensure convergence and avoid the calculation instabilities. The Courant, Friedrichs and Lewy coefficient (CFL) and mesh size level are selected to ensure the numerical convergence. The nonlinear partial derivative equations system which governs this flow is solved by an explicit unsteady numerical scheme by the finite volume method. The accuracy of the solution depends on the size of the mesh and also the step of time used in the discretized equations. We have chosen in this study the mesh that gives us a stationary solution with good accuracy.

Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

An Alternative Approach for Assessing the Impact of Cutting Conditions on Surface Roughness Using Single Decision Tree

In this study, an approach to identify factors affecting on surface roughness in a machining process is presented. This study is based on 81 data about surface roughness over a wide range of cutting tools (conventional, cutting tool with holes, cutting tool with composite material), workpiece materials (AISI 1045 Steel, AA2024 aluminum alloy, A48-class30 gray cast iron), spindle speed (630-1000 rpm), feed rate (0.05-0.075 mm/rev), depth of cut (0.05-0.15 mm) and tool overhang (41-65 mm). A single decision tree (SDT) analysis was done to identify factors for predicting a model of surface roughness, and the CART algorithm was employed for building and evaluating regression tree. Results show that a single decision tree is better than traditional regression models with higher rate and forecast accuracy and strong value.

Performance Analysis of Artificial Neural Network Based Land Cover Classification

Landcover classification using automated classification techniques, while employing remotely sensed multi-spectral imagery, is one of the promising areas of research. Different land conditions at different time are captured through satellite and monitored by applying different classification algorithms in specific environment. In this paper, a SPOT-5 image provided by SUPARCO has been studied and classified in Environment for Visual Interpretation (ENVI), a tool widely used in remote sensing. Then, Artificial Neural Network (ANN) classification technique is used to detect the land cover changes in Abbottabad district. Obtained results are compared with a pixel based Distance classifier. The results show that ANN gives the better overall accuracy of 99.20% and Kappa coefficient value of 0.98 over the Mahalanobis Distance Classifier.

CFD Study of Subcooled Boiling Flow at Elevated Pressure Using a Mechanistic Wall Heat Partitioning Model

The wide range of industrial applications involved with boiling flows promotes the necessity of establishing fundamental knowledge in boiling flow phenomena. For this purpose, a number of experimental and numerical researches have been performed to elucidate the underlying physics of this flow. In this paper, the improved wall boiling models, implemented on ANSYS CFX 14.5, were introduced to study subcooled boiling flow at elevated pressure. At the heated wall boundary, the Fractal model, Force balance approach and Mechanistic frequency model are given for predicting the nucleation site density, bubble departure diameter, and bubble departure frequency. The presented wall heat flux partitioning closures were modified to consider the influence of bubble sliding along the wall before the lift-off, which usually happens in the flow boiling. The simulation was performed based on the Two-fluid model, where the standard k-ω SST model was selected for turbulence modelling. Existing experimental data at around 5 bars were chosen to evaluate the accuracy of the presented mechanistic approach. The void fraction and Interfacial Area Concentration (IAC) are in good agreement with the experimental data. However, the predicted bubble velocity and Sauter Mean Diameter (SMD) are over-predicted. This over-prediction may be caused by consideration of only dispersed and spherical bubbles in the simulations. In the future work, the important physical mechanisms of bubbles, such as merging and shrinking during sliding on the heated wall will be incorporated into this mechanistic model to enhance its capability for a wider range of flow prediction.

Electromyography Pattern Classification with Laplacian Eigenmaps in Human Running

Electromyography (EMG) is one of the most important interfaces between humans and robots for rehabilitation. Decoding this signal helps to recognize muscle activation and converts it into smooth motion for the robots. Detecting each muscle’s pattern during walking and running is vital for improving the quality of a patient’s life. In this study, EMG data from 10 muscles in 10 subjects at 4 different speeds were analyzed. EMG signals are nonlinear with high dimensionality. To deal with this challenge, we extracted some features in time-frequency domain and used manifold learning and Laplacian Eigenmaps algorithm to find the intrinsic features that represent data in low-dimensional space. We then used the Bayesian classifier to identify various patterns of EMG signals for different muscles across a range of running speeds. The best result for vastus medialis muscle corresponds to 97.87±0.69 for sensitivity and 88.37±0.79 for specificity with 97.07±0.29 accuracy using Bayesian classifier. The results of this study provide important insight into human movement and its application for robotics research.

Morphological Analysis of English L1-Persian L2 Adult Learners’ Interlanguage: From the Perspective of SLA Variation

Studies on interlanguage have long been engaged in describing the phenomenon of variation in SLA. Pursuing the same goal and particularly addressing the role of linguistic features, this study describes the use of Persian morphology in the interlanguage of two adult English-speaking learners of Persian L2. Taking the general approach of a combination of contrastive analysis, error analysis and interlanguage analysis, this study focuses on the identification and prediction of some possible instances of transfer from English L1 to Persian L2 across six elicitation tasks aiming to investigate whether any of contextual features may variably influence the learners’ order of morpheme accuracy in the areas of copula, possessives, articles, demonstratives, plural form, personal pronouns, and genitive cases.  Results describe the existence of task variation in the interlanguage system of Persian L2 learners.

Experimental Study of CO2 Absorption in Different Blend Solutions as Solvent for CO2 Capture

Nowadays, removal of CO2 as one of the major contributors to global warming using alternative solvents with high CO2 absorption efficiency, is an important industrial operation. In this study, three amines, including 2-methylpiperazine, potassium sarcosinate and potassium lysinate as potential additives, were added to the potassium carbonate solution as a base solvent for CO2 capture. In order to study the absorption performance of CO2 in terms of loading capacity of CO2 and absorption rate, the absorption experiments in a blend of additives with potassium carbonate were carried out using the vapor-liquid equilibrium apparatus at a temperature of 313.15 K, CO2 partial pressures ranging from 0 to 50 kPa and at mole fractions 0.2, 0.3, and 0.4. Furthermore, the performance of CO2 absorption in these blend solutions was compared with pure monoethanolamine and with pure potassium carbonate. Finally, a correlation with good accuracy was developed using the nonlinear regression analysis in order to predict CO2 loading capacity.

Automatic Staging and Subtype Determination for Non-Small Cell Lung Carcinoma Using PET Image Texture Analysis

In this study, our goal was to perform tumor staging and subtype determination automatically using different texture analysis approaches for a very common cancer type, i.e., non-small cell lung carcinoma (NSCLC). Especially, we introduced a texture analysis approach, called Law’s texture filter, to be used in this context for the first time. The 18F-FDG PET images of 42 patients with NSCLC were evaluated. The number of patients for each tumor stage, i.e., I-II, III or IV, was 14. The patients had ~45% adenocarcinoma (ADC) and ~55% squamous cell carcinoma (SqCCs). MATLAB technical computing language was employed in the extraction of 51 features by using first order statistics (FOS), gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), and Laws’ texture filters. The feature selection method employed was the sequential forward selection (SFS). Selected textural features were used in the automatic classification by k-nearest neighbors (k-NN) and support vector machines (SVM). In the automatic classification of tumor stage, the accuracy was approximately 59.5% with k-NN classifier (k=3) and 69% with SVM (with one versus one paradigm), using 5 features. In the automatic classification of tumor subtype, the accuracy was around 92.7% with SVM one vs. one. Texture analysis of FDG-PET images might be used, in addition to metabolic parameters as an objective tool to assess tumor histopathological characteristics and in automatic classification of tumor stage and subtype.

Comparison of Different k-NN Models for Speed Prediction in an Urban Traffic Network

A database that records average traffic speeds measured at five-minute intervals for all the links in the traffic network of a metropolitan city. While learning from this data the models that can predict future traffic speed would be beneficial for the applications such as the car navigation system, building predictive models for every link becomes a nontrivial job if the number of links in a given network is huge. An advantage of adopting k-nearest neighbor (k-NN) as predictive models is that it does not require any explicit model building. Instead, k-NN takes a long time to make a prediction because it needs to search for the k-nearest neighbors in the database at prediction time. In this paper, we investigate how much we can speed up k-NN in making traffic speed predictions by reducing the amount of data to be searched for without a significant sacrifice of prediction accuracy. The rationale behind this is that we had a better look at only the recent data because the traffic patterns not only repeat daily or weekly but also change over time. In our experiments, we build several different k-NN models employing different sets of features which are the current and past traffic speeds of the target link and the neighbor links in its up/down-stream. The performances of these models are compared by measuring the average prediction accuracy and the average time taken to make a prediction using various amounts of data.

Diagnosis of Diabetes Using Computer Methods: Soft Computing Methods for Diabetes Detection Using Iris

Complementary and Alternative Medicine (CAM) techniques are quite popular and effective for chronic diseases. Iridology is more than 150 years old CAM technique which analyzes the patterns, tissue weakness, color, shape, structure, etc. for disease diagnosis. The objective of this paper is to validate the use of iridology for the diagnosis of the diabetes. The suggested model was applied in a systemic disease with ocular effects. 200 subject data of 100 each diabetic and non-diabetic were evaluated. Complete procedure was kept very simple and free from the involvement of any iridologist. From the normalized iris, the region of interest was cropped. All 63 features were extracted using statistical, texture analysis, and two-dimensional discrete wavelet transformation. A comparison of accuracies of six different classifiers has been presented. The result shows 89.66% accuracy by the random forest classifier.

A Simple and Empirical Refraction Correction Method for UAV-Based Shallow-Water Photogrammetry

The aerial photogrammetry of shallow water bottoms has the potential to be an efficient high-resolution survey technique for shallow water topography, thanks to the advent of convenient UAV and automatic image processing techniques Structure-from-Motion (SfM) and Multi-View Stereo (MVS)). However, it suffers from the systematic overestimation of the bottom elevation, due to the light refraction at the air-water interface. In this study, we present an empirical method to correct for the effect of refraction after the usual SfM-MVS processing, using common software. The presented method utilizes the empirical relation between the measured true depth and the estimated apparent depth to generate an empirical correction factor. Furthermore, this correction factor was utilized to convert the apparent water depth into a refraction-corrected (real-scale) water depth. To examine its effectiveness, we applied the method to two river sites, and compared the RMS errors in the corrected bottom elevations with those obtained by three existing methods. The result shows that the presented method is more effective than the two existing methods: The method without applying correction factor and the method utilizes the refractive index of water (1.34) as correction factor. In comparison with the remaining existing method, which used the additive terms (offset) after calculating correction factor, the presented method performs well in Site 2 and worse in Site 1. However, we found this linear regression method to be unstable when the training data used for calibration are limited. It also suffers from a large negative bias in the correction factor when the apparent water depth estimated is affected by noise, according to our numerical experiment. Overall, the good accuracy of refraction correction method depends on various factors such as the locations, image acquisition, and GPS measurement conditions. The most effective method can be selected by using statistical selection (e.g. leave-one-out cross validation).

A Self Organized Map Method to Classify Auditory-Color Synesthesia from Frontal Lobe Brain Blood Volume

Absolute pitch is the ability to identify a musical note without a reference tone. Training for absolute pitch often occurs in preschool education. It is necessary to clarify how well the trainee can make use of synesthesia in order to evaluate the effect of the training. To the best of our knowledge, there are no existing methods for objectively confirming whether the subject is using synesthesia. Therefore, in this study, we present a method to distinguish the use of color-auditory synesthesia from the separate use of color and audition during absolute pitch training. This method measures blood volume in the prefrontal cortex using functional Near-infrared spectroscopy (fNIRS) and assumes that the cognitive step has two parts, a non-linear step and a linear step. For the linear step, we assume a second order ordinary differential equation. For the non-linear part, it is extremely difficult, if not impossible, to create an inverse filter of such a complex system as the brain. Therefore, we apply a method based on a self-organizing map (SOM) and are guided by the available data. The presented method was tested using 15 subjects, and the estimation accuracy is reported.

Comparison of the Distillation Curve Obtained Experimentally with the Curve Extrapolated by a Commercial Simulator

True Boiling Point distillation (TBP) is one of the most common experimental techniques for the determination of petroleum properties. This curve provides information about the performance of petroleum in terms of its cuts. The experiment is performed in a few days. Techniques are used to determine the properties faster with a software that calculates the distillation curve when a little information about crude oil is known. In order to evaluate the accuracy of distillation curve prediction, eight points of the TBP curve and specific gravity curve (348 K and 523 K) were inserted into the HYSYS Oil Manager, and the extended curve was evaluated up to 748 K. The methods were able to predict the curve with the accuracy of 0.6%-9.2% error (Software X ASTM), 0.2%-5.1% error (Software X Spaltrohr).

High-Speed Particle Image Velocimetry of the Flow around a Moving Train Model with Boundary Layer Control Elements

Trackside induced airflow velocities, also known as slipstream velocities, are an important criterion for the design of high-speed trains. The maximum permitted values are given by the Technical Specifications for Interoperability (TSI) and have to be checked in the approval process. For train manufactures it is of great interest to know in advance, how new train geometries would perform in TSI tests. The Reynolds number in moving model experiments is lower compared to full-scale. Especially the limited model length leads to a thinner boundary layer at the rear end. The hypothesis is that the boundary layer rolls up to characteristic flow structures in the train wake, in which the maximum flow velocities can be observed. The idea is to enlarge the boundary layer using roughness elements at the train model head so that the ratio between the boundary layer thickness and the car width at the rear end is comparable to a full-scale train. This may lead to similar flow structures in the wake and better prediction accuracy for TSI tests. In this case, the design of the roughness elements is limited by the moving model rig. Small rectangular roughness shapes are used to get a sufficient effect on the boundary layer, while the elements are robust enough to withstand the high accelerating and decelerating forces during the test runs. For this investigation, High-Speed Particle Image Velocimetry (HS-PIV) measurements on an ICE3 train model have been realized in the moving model rig of the DLR in Göttingen, the so called tunnel simulation facility Göttingen (TSG). The flow velocities within the boundary layer are analysed in a plain parallel to the ground. The height of the plane corresponds to a test position in the EN standard (TSI). Three different shapes of roughness elements are tested. The boundary layer thickness and displacement thickness as well as the momentum thickness and the form factor are calculated along the train model. Conditional sampling is used to analyse the size and dynamics of the flow structures at the time of maximum velocity in the train wake behind the train. As expected, larger roughness elements increase the boundary layer thickness and lead to larger flow velocities in the boundary layer and in the wake flow structures. The boundary layer thickness, displacement thickness and momentum thickness are increased by using larger roughness especially when applied in the height close to the measuring plane. The roughness elements also cause high fluctuations in the form factors of the boundary layer. Behind the roughness elements, the form factors rapidly are approaching toward constant values. This indicates that the boundary layer, while growing slowly along the second half of the train model, has reached a state of equilibrium.

Design and Application of NFC-Based Identity and Access Management in Cloud Services

In response to a changing world and the fast growth of the Internet, more and more enterprises are replacing web-based services with cloud-based ones. Multi-tenancy technology is becoming more important especially with Software as a Service (SaaS). This in turn leads to a greater focus on the application of Identity and Access Management (IAM). Conventional Near-Field Communication (NFC) based verification relies on a computer browser and a card reader to access an NFC tag. This type of verification does not support mobile device login and user-based access management functions. This study designs an NFC-based third-party cloud identity and access management scheme (NFC-IAM) addressing this shortcoming. Data from simulation tests analyzed with Key Performance Indicators (KPIs) suggest that the NFC-IAM not only takes less time in identity identification but also cuts time by 80% in terms of two-factor authentication and improves verification accuracy to 99.9% or better. In functional performance analyses, NFC-IAM performed better in salability and portability. The NFC-IAM App (Application Software) and back-end system to be developed and deployed in mobile device are to support IAM features and also offers users a more user-friendly experience and stronger security protection. In the future, our NFC-IAM can be employed to different environments including identification for mobile payment systems, permission management for remote equipment monitoring, among other applications.

Online Battery Equivalent Circuit Model Estimation on Continuous-Time Domain Using Linear Integral Filter Method

Equivalent circuit models (ECMs) are widely used in battery management systems in electric vehicles and other battery energy storage systems. The battery dynamics and the model parameters vary under different working conditions, such as different temperature and state of charge (SOC) levels, and therefore online parameter identification can improve the modelling accuracy. This paper presents a way of online ECM parameter identification using a continuous time (CT) estimation method. The CT estimation method has several advantages over discrete time (DT) estimation methods for ECM parameter identification due to the widely separated battery dynamic modes and fast sampling. The presented method can be used for online SOC estimation. Test data are collected using a lithium ion cell, and the experimental results show that the presented CT method achieves better modelling accuracy compared with the conventional DT recursive least square method. The effectiveness of the presented method for online SOC estimation is also verified on test data.

Application of Data Mining Techniques for Tourism Knowledge Discovery

Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.

A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.