Building an Integrated Relational Database from Swiss Nutrition National Survey and Swiss Health Datasets for Data Mining Purposes

Objective: The objective of the study was to integrate two big databases from Swiss nutrition national survey (menuCH) and Swiss health national survey 2012 for data mining purposes. Each database has a demographic base data. An integrated Swiss database is built to later discover critical food consumption patterns linked with lifestyle diseases known to be strongly tied with food consumption. Design: Swiss nutrition national survey (menuCH) with approx. 2000 respondents from two different surveys, one by Phone and the other by questionnaire along with Swiss health national survey 2012 with 21500 respondents were pre-processed, cleaned and finally integrated to a unique relational database. Results: The result of this study is an integrated relational database from the Swiss nutritional and health databases.

Parametric Approach for Reserve Liability Estimate in Mortgage Insurance

Chain Ladder (CL) method, Expected Loss Ratio (ELR) method and Bornhuetter-Ferguson (BF) method, in addition to more complex transition-rate modeling, are commonly used actuarial reserving methods in general insurance. There is limited published research about their relative performance in the context of Mortgage Insurance (MI). In our experience, these traditional techniques pose unique challenges and do not provide stable claim estimates for medium to longer term liabilities. The relative strengths and weaknesses among various alternative approaches revolve around: stability in the recent loss development pattern, sufficiency and reliability of loss development data, and agreement/disagreement between reported losses to date and ultimate loss estimate. CL method results in volatile reserve estimates, especially for accident periods with little development experience. The ELR method breaks down especially when ultimate loss ratios are not stable and predictable. While the BF method provides a good tradeoff between the loss development approach (CL) and ELR, the approach generates claim development and ultimate reserves that are disconnected from the ever-to-date (ETD) development experience for some accident years that have more development experience. Further, BF is based on subjective a priori assumption. The fundamental shortcoming of these methods is their inability to model exogenous factors, like the economy, which impact various cohorts at the same chronological time but at staggered points along their life-time development. This paper proposes an alternative approach of parametrizing the loss development curve and using logistic regression to generate the ultimate loss estimate for each homogeneous group (accident year or delinquency period). The methodology was tested on an actual MI claim development dataset where various cohorts followed a sigmoidal trend, but levels varied substantially depending upon the economic and operational conditions during the development period spanning over many years. The proposed approach provides the ability to indirectly incorporate such exogenous factors and produce more stable loss forecasts for reserving purposes as compared to the traditional CL and BF methods.

Towards Real-Time Classification of Finger Movement Direction Using Encephalography Independent Components

This study explores the practicality of using electroencephalographic (EEG) independent components to predict eight-direction finger movements in pseudo-real-time. Six healthy participants with individual-head MRI images performed finger movements in eight directions with two different arm configurations. The analysis was performed in two stages. The first stage consisted of using independent component analysis (ICA) to separate the signals representing brain activity from non-brain activity signals and to obtain the unmixing matrix. The resulting independent components (ICs) were checked, and those reflecting brain-activity were selected. Finally, the time series of the selected ICs were used to predict eight finger-movement directions using Sparse Logistic Regression (SLR). The second stage consisted of using the previously obtained unmixing matrix, the selected ICs, and the model obtained by applying SLR to classify a different EEG dataset. This method was applied to two different settings, namely the single-participant level and the group-level. For the single-participant level, the EEG dataset used in the first stage and the EEG dataset used in the second stage originated from the same participant. For the group-level, the EEG datasets used in the first stage were constructed by temporally concatenating each combination without repetition of the EEG datasets of five participants out of six, whereas the EEG dataset used in the second stage originated from the remaining participants. The average test classification results across datasets (mean ± S.D.) were 38.62 ± 8.36% for the single-participant, which was significantly higher than the chance level (12.50 ± 0.01%), and 27.26 ± 4.39% for the group-level which was also significantly higher than the chance level (12.49% ± 0.01%). The classification accuracy within [–45°, 45°] of the true direction is 70.03 ± 8.14% for single-participant and 62.63 ± 6.07% for group-level which may be promising for some real-life applications. Clustering and contribution analyses further revealed the brain regions involved in finger movement and the temporal aspect of their contribution to the classification. These results showed the possibility of using the ICA-based method in combination with other methods to build a real-time system to control prostheses.

Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Laser Data Based Automatic Generation of Lane-Level Road Map for Intelligent Vehicles

With the development of intelligent vehicle systems, a high-precision road map is increasingly needed in many aspects. The automatic lane lines extraction and modeling are the most essential steps for the generation of a precise lane-level road map. In this paper, an automatic lane-level road map generation system is proposed. To extract the road markings on the ground, the multi-region Otsu thresholding method is applied, which calculates the intensity value of laser data that maximizes the variance between background and road markings. The extracted road marking points are then projected to the raster image and clustered using a two-stage clustering algorithm. Lane lines are subsequently recognized from these clusters by the shape features of their minimum bounding rectangle. To ensure the storage efficiency of the map, the lane lines are approximated to cubic polynomial curves using a Bayesian estimation approach. The proposed lane-level road map generation system has been tested on urban and expressway conditions in Hefei, China. The experimental results on the datasets show that our method can achieve excellent extraction and clustering effect, and the fitted lines can reach a high position accuracy with an error of less than 10 cm.

Development of Fake News Model Using Machine Learning through Natural Language Processing

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Efficient HAAR Wavelet Transform with Embedded Zerotrees of Wavelet Compression for Color Images

This study is expected to compress true color image with compression algorithms in color spaces to provide high compression rates. The need of high compression ratio is to improve storage space. Alternative aim is to rank compression algorithms in a suitable color space. The dataset is sequence of true color images with size 128 x 128. HAAR Wavelet is one of the famous wavelet transforms, has great potential and maintains image quality of color images. HAAR wavelet Transform using Set Partitioning in Hierarchical Trees (SPIHT) algorithm with different color spaces framework is applied to compress sequence of images with angles. Embedded Zerotrees of Wavelet (EZW) is a powerful standard method to sequence data. Hence the proposed compression frame work of HAAR wavelet, xyz color space, morphological gradient and applied image with EZW compression, obtained improvement to other methods, in terms of Compression Ratio, Mean Square Error, Peak Signal Noise Ratio and Bits Per Pixel quality measures.

Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Cold-start is a notoriously difficult problem which can occur in recommendation systems, and arises when there is insufficient information to draw inferences for users or items. To address this challenge, a contextual bandit algorithm – the Fast Approximate Bayesian Contextual Cold Start Learning algorithm (FAB-COST) – is proposed, which is designed to provide improved accuracy compared to the traditionally used Laplace approximation in the logistic contextual bandit, while controlling both algorithmic complexity and computational cost. To this end, FAB-COST uses a combination of two moment projection variational methods: Expectation Propagation (EP), which performs well at the cold start, but becomes slow as the amount of data increases; and Assumed Density Filtering (ADF), which has slower growth of computational cost with data size but requires more data to obtain an acceptable level of accuracy. By switching from EP to ADF when the dataset becomes large, it is able to exploit their complementary strengths. The empirical justification for FAB-COST is presented, and systematically compared to other approaches on simulated data. In a benchmark against the Laplace approximation on real data consisting of over 670, 000 impressions from autotrader.co.uk, FAB-COST demonstrates at one point increase of over 16% in user clicks. On the basis of these results, it is argued that FAB-COST is likely to be an attractive approach to cold-start recommendation systems in a variety of contexts.

Artificial Intelligence-Based Chest X-Ray Test of COVID-19 Patients

The management of COVID-19 patients based on chest imaging is emerging as an essential tool for evaluating the spread of the pandemic which has gripped the global community. It has already been used to monitor the situation of COVID-19 patients who have issues in respiratory status. There has been increase to use chest imaging for medical triage of patients who are showing moderate-severe clinical COVID-19 features, this is due to the fast dispersal of the pandemic to all continents and communities. This article demonstrates the development of machine learning techniques for the test of COVID-19 patients using Chest X-Ray (CXR) images in nearly real-time, to distinguish the COVID-19 infection with a significantly high level of accuracy. The testing performance has covered a combination of different datasets of CXR images of positive COVID-19 patients, patients with viral and bacterial infections, also, people with a clear chest. The proposed AI scheme successfully distinguishes CXR scans of COVID-19 infected patients from CXR scans of viral and bacterial based pneumonia as well as normal cases with an average accuracy of 94.43%, sensitivity 95%, and specificity 93.86%. Predicted decisions would be supported by visual evidence to help clinicians speed up the initial assessment process of new suspected cases, especially in a resource-constrained environment.

Intelligent Transport System: Classification of Traffic Signs Using Deep Neural Networks in Real Time

Traffic control has been one of the most common and irritating problems since the time automobiles have hit the roads. Problems like traffic congestion have led to a significant time burden around the world and one significant solution to these problems can be the proper implementation of the Intelligent Transport System (ITS). It involves the integration of various tools like smart sensors, artificial intelligence, position technologies and mobile data services to manage traffic flow, reduce congestion and enhance driver's ability to avoid accidents during adverse weather. Road and traffic signs’ recognition is an emerging field of research in ITS. Classification problem of traffic signs needs to be solved as it is a major step in our journey towards building semi-autonomous/autonomous driving systems. The purpose of this work focuses on implementing an approach to solve the problem of traffic sign classification by developing a Convolutional Neural Network (CNN) classifier using the GTSRB (German Traffic Sign Recognition Benchmark) dataset. Rather than using hand-crafted features, our model addresses the concern of exploding huge parameters and data method augmentations. Our model achieved an accuracy of around 97.6% which is comparable to various state-of-the-art architectures.

Application of Advanced Remote Sensing Data in Mineral Exploration in the Vicinity of Heavy Dense Forest Cover Area of Jharkhand and Odisha State Mining Area

The study has been carried out on the Saranda in Jharkhand and a part of Odisha state. Geospatial data of Hyperion, a remote sensing satellite, have been used. This study has used a wide variety of patterns related to image processing to enhance and extract the mining class of Fe and Mn ores.Landsat-8, OLI sensor data have also been used to correctly explore related minerals. In this way, various processes have been applied to increase the mineralogy class and comparative evaluation with related frequency done. The Hyperion dataset for hyperspectral remote sensing has been specifically verified as an effective tool for mineral or rock information extraction within the band range of shortwave infrared used. The abundant spatial and spectral information contained in hyperspectral images enables the differentiation of different objects of any object into targeted applications for exploration such as exploration detection, mining.

NANCY: Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image Registration

Multimodal image registration is a profoundly complex task which is why deep learning has been used widely to address it in recent years. However, two main challenges remain: Firstly, the lack of ground truth data calls for an unsupervised learning approach, which leads to the second challenge of defining a feasible loss function that can compare two images of different modalities to judge their level of alignment. To avoid this issue altogether we implement a generative adversarial network consisting of two registration networks GAB, GBA and two discrimination networks DA, DB connected by spatial transformation layers. GAB learns to generate a deformation field which registers an image of the modality B to an image of the modality A. To do that, it uses the feedback of the discriminator DB which is learning to judge the quality of alignment of the registered image B. GBA and DA learn a mapping from modality A to modality B. Additionally, a cycle-consistency loss is implemented. For this, both registration networks are employed twice, therefore resulting in images ˆA, ˆB which were registered to ˜B, ˜A which were registered to the initial image pair A, B. Thus the resulting and initial images of the same modality can be easily compared. A dataset of liver CT and MRI was used to evaluate the quality of our approach and to compare it against learning and non-learning based registration algorithms. Our approach leads to dice scores of up to 0.80 ± 0.01 and is therefore comparable to and slightly more successful than algorithms like SimpleElastix and VoxelMorph.

Vision-Based Daily Routine Recognition for Healthcare with Transfer Learning

We propose to record Activities of Daily Living (ADLs) of elderly people using a vision-based system so as to provide better assistive and personalization technologies. Current ADL-related research is based on data collected with help from non-elderly subjects in laboratory environments and the activities performed are predetermined for the sole purpose of data collection. To obtain more realistic datasets for the application, we recorded ADLs for the elderly with data collected from real-world environment involving real elderly subjects. Motivated by the need to collect data for more effective research related to elderly care, we chose to collect data in the room of an elderly person. Specifically, we installed Kinect, a vision-based sensor on the ceiling, to capture the activities that the elderly subject performs in the morning every day. Based on the data, we identified 12 morning activities that the elderly person performs daily. To recognize these activities, we created a HARELCARE framework to investigate into the effectiveness of existing Human Activity Recognition (HAR) algorithms and propose the use of a transfer learning algorithm for HAR. We compared the performance, in terms of accuracy, and training progress. Although the collected dataset is relatively small, the proposed algorithm has a good potential to be applied to all daily routine activities for healthcare purposes such as evidence-based diagnosis and treatment.

Power Production Performance of Different Wave Energy Converters in the Southwestern Black Sea

This study aims to investigate the amount of energy (economic wave energy potential) that can be obtained from the existing wave energy converters in the high wave energy potential region of the Black Sea in terms of wave energy potential and their performance at different depths in the region. The data needed for this purpose were obtained using the calibrated nested layered SWAN wave modeling program version 41.01AB, which was forced with Climate Forecast System Reanalysis (CFSR) winds from 1979 to 2009. The wave dataset at a time interval of 2 hours was accumulated for a sub-grid domain for around Karaburun beach in Arnavutkoy, a district of Istanbul city. The annual sea state characteristic matrices for the five different depths along with a vertical line to the coastline were calculated for 31 years. According to the power matrices of different wave energy converter systems and characteristic matrices for each possible installation depth, the probability distribution tables of the specified mean wave period or wave energy period and significant wave height were calculated. Then, by using the relationship between these distribution tables, according to the present wave climate, the energy that the wave energy converter systems at each depth can produce was determined. Thus, the economically feasible potential of the relevant coastal zone was revealed, and the effect of different depths on energy converter systems is presented. The Oceantic at 50, 75 and 100 m depths and Oyster at 5 and 25 m depths presents the best performance. In the 31-year long period 1998 the most and 1989 is the least dynamic year.

Two Concurrent Convolution Neural Networks TC*CNN Model for Face Recognition Using Edge

In this paper we develop a model that couples Two Concurrent Convolution Neural Network with different filters (TC*CNN) for face recognition and compare its performance to an existing sequential CNN (base model). We also test and compare the quality and performance of the models on three datasets with various levels of complexity (easy, moderate, and difficult) and show that for the most complex datasets, edges will produce the most accurate and efficient results. We further show that in such cases while Support Vector Machine (SVM) models are fast, they do not produce accurate results.

Convergence Analysis of Training Two-Hidden-Layer Partially Over-Parameterized ReLU Networks via Gradient Descent

Over-parameterized neural networks have attracted a great deal of attention in recent deep learning theory research, as they challenge the classic perspective of over-fitting when the model has excessive parameters and have gained empirical success in various settings. While a number of theoretical works have been presented to demystify properties of such models, the convergence properties of such models are still far from being thoroughly understood. In this work, we study the convergence properties of training two-hidden-layer partially over-parameterized fully connected networks with the Rectified Linear Unit activation via gradient descent. To our knowledge, this is the first theoretical work to understand convergence properties of deep over-parameterized networks without the equally-wide-hidden-layer assumption and other unrealistic assumptions. We provide a probabilistic lower bound of the widths of hidden layers and proved linear convergence rate of gradient descent. We also conducted experiments on synthetic and real-world datasets to validate our theory.

Infrastructure Change Monitoring Using Multitemporal Multispectral Satellite Images

The main objective of this study is to find a suitable approach to monitor the land infrastructure growth over a period of time using multispectral satellite images. Bi-temporal change detection method is unable to indicate the continuous change occurring over a long period of time. To achieve this objective, the approach used here estimates a statistical model from series of multispectral image data over a long period of time, assuming there is no considerable change during that time period and then compare it with the multispectral image data obtained at a later time. The change is estimated pixel-wise. Statistical composite hypothesis technique is used for estimating pixel based change detection in a defined region. The generalized likelihood ratio test (GLRT) is used to detect the changed pixel from probabilistic estimated model of the corresponding pixel. The changed pixel is detected assuming that the images have been co-registered prior to estimation. To minimize error due to co-registration, 8-neighborhood pixels around the pixel under test are also considered. The multispectral images from Sentinel-2 and Landsat-8 from 2015 to 2018 are used for this purpose. There are different challenges in this method. First and foremost challenge is to get quite a large number of datasets for multivariate distribution modelling. A large number of images are always discarded due to cloud coverage. Due to imperfect modelling there will be high probability of false alarm. Overall conclusion that can be drawn from this work is that the probabilistic method described in this paper has given some promising results, which need to be pursued further.

Rank-Based Chain-Mode Ensemble for Binary Classification

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Building and Tree Detection Using Multiscale Matched Filtering

In this study, an automated building and tree detection method is proposed using DSM data and true orthophoto image. A multiscale matched filtering is used on DSM data. Therefore, first watershed transform is applied. Then, Otsu’s thresholding method is used as an adaptive threshold to segment each watershed region. Detected objects are masked with NDVI to separate buildings and trees. The proposed method is able to detect buildings and trees without entering any elevation threshold. We tested our method on ISPRS semantic labeling dataset and obtained promising results.

Cultural Effects on the Performance of Non- Profit and For-Profit Microfinance Institutions

Using a large dataset of more than 2,400 individual microfinance institutions (MFIs) from 120 countries from 1999 to 2016, this study finds that nearly half of the international MFIs operate as for-profit institutions. Formal institutions (business regulatory environment, property rights, social protection, and a developed financial sector) impact the likelihood of MFIs being for-profit across countries. Cultural differences across countries (power distance, individualism, masculinity, and indulgence) seem to be a factor in the legal status of the MFI (non-profit or for-profit). MFIs in countries with stronger formal institutions, a greater degree of power distance, and a higher degree of collectivism experience better financial and social performance.