Machine Learning Techniques in Bank Credit Analysis

The aim of this paper is to compare and discuss better classifier algorithm options for credit risk assessment by applying different Machine Learning techniques. Using records from a Brazilian financial institution, this study uses a database of 5,432 companies that are clients of the bank, where 2,600 clients are classified as non-defaulters, 1,551 are classified as defaulters and 1,281 are temporarily defaulters, meaning that the clients are overdue on their payments for up 180 days. For each case, a total of 15 attributes was considered for a one-against-all assessment using four different techniques: Artificial Neural Networks Multilayer Perceptron (ANN-MLP), Artificial Neural Networks Radial Basis Functions (ANN-RBF), Logistic Regression (LR) and finally Support Vector Machines (SVM). For each method, different parameters were analyzed in order to obtain different results when the best of each technique was compared. Initially the data were coded in thermometer code (numerical attributes) or dummy coding (for nominal attributes). The methods were then evaluated for each parameter and the best result of each technique was compared in terms of accuracy, false positives, false negatives, true positives and true negatives. This comparison showed that the best method, in terms of accuracy, was ANN-RBF (79.20% for non-defaulter classification, 97.74% for defaulters and 75.37% for the temporarily defaulter classification). However, the best accuracy does not always represent the best technique. For instance, on the classification of temporarily defaulters, this technique, in terms of false positives, was surpassed by SVM, which had the lowest rate (0.07%) of false positive classifications. All these intrinsic details are discussed considering the results found, and an overview of what was presented is shown in the conclusion of this study.

Vision-Based Daily Routine Recognition for Healthcare with Transfer Learning

We propose to record Activities of Daily Living (ADLs) of elderly people using a vision-based system so as to provide better assistive and personalization technologies. Current ADL-related research is based on data collected with help from non-elderly subjects in laboratory environments and the activities performed are predetermined for the sole purpose of data collection. To obtain more realistic datasets for the application, we recorded ADLs for the elderly with data collected from real-world environment involving real elderly subjects. Motivated by the need to collect data for more effective research related to elderly care, we chose to collect data in the room of an elderly person. Specifically, we installed Kinect, a vision-based sensor on the ceiling, to capture the activities that the elderly subject performs in the morning every day. Based on the data, we identified 12 morning activities that the elderly person performs daily. To recognize these activities, we created a HARELCARE framework to investigate into the effectiveness of existing Human Activity Recognition (HAR) algorithms and propose the use of a transfer learning algorithm for HAR. We compared the performance, in terms of accuracy, and training progress. Although the collected dataset is relatively small, the proposed algorithm has a good potential to be applied to all daily routine activities for healthcare purposes such as evidence-based diagnosis and treatment.

Determination of Optimal Stress Locations in 2D–9 Noded Element in Finite Element Technique

In Finite Element Technique nodal stresses are calculated through displacement as nodes. In this process, the displacement calculated at nodes is sufficiently good enough but stresses calculated at nodes are not sufficiently accurate. Therefore, the accuracy in the stress computation in FEM models based on the displacement technique is obviously matter of concern for computational time in shape optimization of engineering problems. In the present work same is focused to find out unique points within the element as well as the boundary of the element so, that good accuracy in stress computation can be achieved. Generally, major optimal stress points are located in domain of the element some points have been also located at boundary of the element where stresses are fairly accurate as compared to nodal values. Then, it is subsequently concluded that there is an existence of unique points within the element, where stresses have higher accuracy than other points in the elements. Therefore, it is main aim is to evolve a generalized procedure for the determination of the optimal stress location inside the element as well as at the boundaries of the element and verify the same with results from numerical experimentation. The results of quadratic 9 noded serendipity elements are presented and the location of distinct optimal stress points is determined inside the element, as well as at the boundaries. The theoretical results indicate various optimal stress locations are in local coordinates at origin and at a distance of 0.577 in both directions from origin. Also, at the boundaries optimal stress locations are at the midpoints of the element boundary and the locations are at a distance of 0.577 from the origin in both directions. The above findings were verified through experimentation and findings were authenticated. For numerical experimentation five engineering problems were identified and the numerical results of 9-noded element were compared to those obtained by using the same order of 25-noded quadratic Lagrangian elements, which are considered as standard. Then root mean square errors are plotted with respect to various locations within the elements as well as the boundaries and conclusions were drawn. After numerical verification it is noted that in a 9-noded element, origin and locations at a distance of 0.577 from origin in both directions are the best sampling points for the stresses. It was also noted that stresses calculated within line at boundary enclosed by 0.577 midpoints are also very good and the error found is very less. When sampling points move away from these points, then it causes line zone error to increase rapidly. Thus, it is established that there are unique points at boundary of element where stresses are accurate, which can be utilized in solving various engineering problems and are also useful in shape optimizations.

GRCNN: Graph Recognition Convolutional Neural Network for Synthesizing Programs from Flow Charts

Program synthesis is the task to automatically generate programs based on user specification. In this paper, we present a framework that synthesizes programs from flow charts that serve as accurate and intuitive specification. In order doing so, we propose a deep neural network called GRCNN that recognizes graph structure from its image. GRCNN is trained end-to-end, which can predict edge and node information of the flow chart simultaneously. Experiments show that the accuracy rate to synthesize a program is 66.4%, and the accuracy rates to recognize edge and node are 94.1% and 67.9%, respectively. On average, it takes about 60 milliseconds to synthesize a program.

Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow

Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.

Normal and Peaberry Coffee Beans Classification from Green Coffee Bean Images Using Convolutional Neural Networks and Support Vector Machine

The aim of this study is to develop a system which can identify and sort peaberries automatically at low cost for coffee producers in developing countries. In this paper, the focus is on the classification of peaberries and normal coffee beans using image processing and machine learning techniques. The peaberry is not bad and not a normal bean. The peaberry is born in an only single seed, relatively round seed from a coffee cherry instead of the usual flat-sided pair of beans. It has another value and flavor. To make the taste of the coffee better, it is necessary to separate the peaberry and normal bean before green coffee beans roasting. Otherwise, the taste of total beans will be mixed, and it will be bad. In roaster procedure time, all the beans shape, size, and weight must be unique; otherwise, the larger bean will take more time for roasting inside. The peaberry has a different size and different shape even though they have the same weight as normal beans. The peaberry roasts slower than other normal beans. Therefore, neither technique provides a good option to select the peaberries. Defect beans, e.g., sour, broken, black, and fade bean, are easy to check and pick up manually by hand. On the other hand, the peaberry pick up is very difficult even for trained specialists because the shape and color of the peaberry are similar to normal beans. In this study, we use image processing and machine learning techniques to discriminate the normal and peaberry bean as a part of the sorting system. As the first step, we applied Deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) as machine learning techniques to discriminate the peaberry and normal bean. As a result, better performance was obtained with CNN than with SVM for the discrimination of the peaberry. The trained artificial neural network with high performance CPU and GPU in this work will be simply installed into the inexpensive and low in calculation Raspberry Pi system. We assume that this system will be used in under developed countries. The study evaluates and compares the feasibility of the methods in terms of accuracy of classification and processing speed.

Controlling of Multi-Level Inverter under Shading Conditions Using Artificial Neural Network

This paper describes the effects of photovoltaic voltage changes on Multi-level inverter (MLI) due to solar irradiation variations, and methods to overcome these changes. The irradiation variation affects the generated voltage, which in turn varies the switching angles required to turn-on the inverter power switches in order to obtain minimum harmonic content in the output voltage profile. Genetic Algorithm (GA) is used to solve harmonics elimination equations of eleven level inverters with equal and non-equal dc sources. After that artificial neural network (ANN) algorithm is proposed to generate appropriate set of switching angles for MLI at any level of input dc sources voltage causing minimization of the total harmonic distortion (THD) to an acceptable limit. MATLAB/Simulink platform is used as a simulation tool and Fast Fourier Transform (FFT) analyses are carried out for output voltage profile to verify the reliability and accuracy of the applied technique for controlling the MLI harmonic distortion. According to the simulation results, the obtained THD for equal dc source is 9.38%, while for variable or unequal dc sources it varies between 10.26% and 12.93% as the input dc voltage varies between 4.47V nd 11.43V respectively. The proposed ANN algorithm provides satisfied simulation results that match with results obtained by alternative algorithms.

Uplink Throughput Prediction in Cellular Mobile Networks

The current and future cellular mobile communication networks generate enormous amounts of data. Networks have become extremely complex with extensive space of parameters, features and counters. These networks are unmanageable with legacy methods and an enhanced design and optimization approach is necessary that is increasingly reliant on machine learning. This paper proposes that machine learning as a viable approach for uplink throughput prediction. LTE radio metric, such as Reference Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), and Signal to Noise Ratio (SNR) are used to train models to estimate expected uplink throughput. The prediction accuracy with high determination coefficient of 91.2% is obtained from measurements collected with a simple smartphone application.

Research on Reservoir Lithology Prediction Based on Residual Neural Network and Squeeze-and- Excitation Neural Network

Conventional reservoir prediction methods ar not sufficient to explore the implicit relation between seismic attributes, and thus data utilization is low. In order to improve the predictive classification accuracy of reservoir lithology, this paper proposes a deep learning lithology prediction method based on ResNet (Residual Neural Network) and SENet (Squeeze-and-Excitation Neural Network). The neural network model is built and trained by using seismic attribute data and lithology data of Shengli oilfield, and the nonlinear mapping relationship between seismic attribute and lithology marker is established. The experimental results show that this method can significantly improve the classification effect of reservoir lithology, and the classification accuracy is close to 70%. This study can effectively predict the lithology of undrilled area and provide support for exploration and development.

Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Validation and Projections for Solar Radiation up to 2100: HadGEM2-AO Global Circulation Model

The objective of this work is to evaluate the results of solar radiation projections between 2006 and 2013 for the state of Rio Grande do Sul, Brazil. The projections are provided by the General Circulation Models (MCGs) belonging to the Coupled Model Intercomparison Phase 5 (CMIP5). In all, the results of the simulation of six models are evaluated, compared to monthly data, measured by a network of thirteen meteorological stations of the National Meteorological Institute (INMET). The performance of the models is evaluated by the Nash coefficient and the Bias. The results are presented in the form of tables, graphs and spatialization maps. The ACCESS1-0 RCP 4.5 model presented the best results for the solar radiation simulations, for the most optimistic scenario, in much of the state. The efficiency coefficients (CEF) were between 0.95 and 0.98. In the most pessimistic scenario, HADGen2-AO RCP 8.5 had the best accuracy among the analyzed models, presenting coefficients of efficiency between 0.94 and 0.98. From this validation, solar radiation projection maps were elaborated, indicating a seasonal increase of this climatic variable in some regions of the Brazilian territory, mainly in the spring.

Lexicon-Based Sentiment Analysis for Stock Movement Prediction

Sentiment analysis is a broad and expanding field that aims to extract and classify opinions from textual data. Lexicon-based approaches are based on the use of a sentiment lexicon, i.e., a list of words each mapped to a sentiment score, to rate the sentiment of a text chunk. Our work focuses on predicting stock price change using a sentiment lexicon built from financial conference call logs. We introduce a method to generate a sentiment lexicon based upon an existing probabilistic approach. By using a domain-specific lexicon, we outperform traditional techniques and demonstrate that domain-specific sentiment lexicons provide higher accuracy than generic sentiment lexicons when predicting stock price change.

Rank-Based Chain-Mode Ensemble for Binary Classification

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Finite Element Analysis of Thermally-Induced Bistable Plate Using Four Plate Elements

The present study deals with the finite element (FE) analysis of thermally-induced bistable plate using various plate elements. The quadrilateral plate elements include the 4-node conforming plate element based on the classical laminate plate theory (CLPT), the 4-node and 9-node Mindlin plate element based on the first-order shear deformation laminated plate theory (FSDT), and a displacement-based 4-node quadrilateral element (RDKQ-NL20). Using the von-Karman’s large deflection theory and the total Lagrangian (TL) approach, the nonlinear FE governing equations for plate under thermal load are derived. Convergence analysis for four elements is first conducted. These elements are then used to predict the stable shapes of thermally-induced bistable plate. Numerical test shows that the plate element based on FSDT, namely the 4-node and 9-node Mindlin, and the RDKQ-NL20 plate element can predict two stable cylindrical shapes while the 4-node conforming plate predicts a saddles shape. Comparing the simulation results with ABAQUS, the RDKQ-NL20 element shows the best accuracy among all the elements.

Numerical Study of Steel Structures Responses to External Explosions

Due to the constant increase in terrorist attacks, the research and engineering communities have given significant attention to building performance under explosions. This paper presents a methodology for studying and simulating the dynamic responses of steel structures during external detonations, particularly for accurately investigating the impact of incrementing charge weight on the members total behavior, resistance and failure. Prediction damage method was introduced to evaluate the damage level of the steel members based on five scenarios of explosions. Johnson–Cook strength and failure model have been used as well as ABAQUS finite element code to simulate the explicit dynamic analysis, and antecedent field tests were used to verify the acceptance and accuracy of the proposed material strength and failure model. Based on the structural response, evaluation criteria such as deflection, vertical displacement, drift index, and damage level; the obtained results show the vulnerability of steel columns and un-braced steel frames which are designed and optimized to carry dead and live load to resist and endure blast loading.

Design of Reconfigurable Supernumerary Robotic Limb Based on Differential Actuated Joints

This paper presents a wearable reconfigurable supernumerary robotic limb with differential actuated joints, which is lightweight, compact and comfortable for the wearers. Compared to the existing supernumerary robotic limbs which mostly adopted series structure with large movement space but poor carrying capacity, a prototype with the series-parallel configuration to better adapt to different task requirements has been developed in this design. To achieve a compact structure, two kinds of cable-driven mechanical structures based on guide pulleys and differential actuated joints were designed. Moreover, two different tension devices were also designed to ensure the reliability and accuracy of the cable-driven transmission. The proposed device also employed self-designed bearings which greatly simplified the structure and reduced the cost.

Automatic Number Plate Recognition System Based on Deep Learning

In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.

A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books.

Improving Fake News Detection Using K-means and Support Vector Machine Approaches

Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.

Logistic Model Tree and Expectation-Maximization for Pollen Recognition and Grouping

Palynology is a field of interest for many disciplines. It has multiple applications such as chronological dating, climatology, allergy treatment, and even honey characterization. Unfortunately, the analysis of a pollen slide is a complicated and time-consuming task that requires the intervention of experts in the field, which is becoming increasingly rare due to economic and social conditions. So, the automation of this task is a necessity. Pollen slides analysis is mainly a visual process as it is carried out with the naked eye. That is the reason why a primary method to automate palynology is the use of digital image processing. This method presents the lowest cost and has relatively good accuracy in pollen retrieval. In this work, we propose a system combining recognition and grouping of pollen. It consists of using a Logistic Model Tree to classify pollen already known by the proposed system while detecting any unknown species. Then, the unknown pollen species are divided using a cluster-based approach. Success rates for the recognition of known species have been achieved, and automated clustering seems to be a promising approach.