Comparative Analysis of Classical and Parallel Inpainting Algorithms Based on Affine Combinations of Projections on Convex Sets

The paper is a comparative study of two classical vari-ants of parallel projection methods for solving the convex feasibility problem with their equivalents that involve variable weights in the construction of the solutions. We used a graphical representation of these methods for inpainting a convex area of an image in order to investigate their effectiveness in image reconstruction applications. We also presented a numerical analysis of the convergence of these four algorithms in terms of the average number of steps and execution time, in classical CPU and, alternativaly, in parallel GPU implementation.

A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Deep Learning Application for Object Image Recognition and Robot Automatic Grasping

Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.

Normal and Peaberry Coffee Beans Classification from Green Coffee Bean Images Using Convolutional Neural Networks and Support Vector Machine

The aim of this study is to develop a system which can identify and sort peaberries automatically at low cost for coffee producers in developing countries. In this paper, the focus is on the classification of peaberries and normal coffee beans using image processing and machine learning techniques. The peaberry is not bad and not a normal bean. The peaberry is born in an only single seed, relatively round seed from a coffee cherry instead of the usual flat-sided pair of beans. It has another value and flavor. To make the taste of the coffee better, it is necessary to separate the peaberry and normal bean before green coffee beans roasting. Otherwise, the taste of total beans will be mixed, and it will be bad. In roaster procedure time, all the beans shape, size, and weight must be unique; otherwise, the larger bean will take more time for roasting inside. The peaberry has a different size and different shape even though they have the same weight as normal beans. The peaberry roasts slower than other normal beans. Therefore, neither technique provides a good option to select the peaberries. Defect beans, e.g., sour, broken, black, and fade bean, are easy to check and pick up manually by hand. On the other hand, the peaberry pick up is very difficult even for trained specialists because the shape and color of the peaberry are similar to normal beans. In this study, we use image processing and machine learning techniques to discriminate the normal and peaberry bean as a part of the sorting system. As the first step, we applied Deep Convolutional Neural Networks (CNN) and Support Vector Machine (SVM) as machine learning techniques to discriminate the peaberry and normal bean. As a result, better performance was obtained with CNN than with SVM for the discrimination of the peaberry. The trained artificial neural network with high performance CPU and GPU in this work will be simply installed into the inexpensive and low in calculation Raspberry Pi system. We assume that this system will be used in under developed countries. The study evaluates and compares the feasibility of the methods in terms of accuracy of classification and processing speed.

A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Performance Evaluation of Distributed Deep Learning Frameworks in Cloud Environment

2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn  features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.

The Folksongs of Jharkhand: An Intangible Cultural Heritage of Tribal India

Jharkhand is newly constituted 28th State in the eastern part of India which is known for the oldest settlement of the indigenous people. In the State of Jharkhand in which broadly three language family are found namely, Austric, Dravidian, and Indo-European. Ex-Mundari, kharia, Ho Santali come from the Austric Language family. Kurukh, Malto under Dravidian language family and Nagpuri Khorta etc. under Indo-European language family. There are 32 Indigenous Communities identified as Scheduled Tribe in the State of Jharkhand. Santhal, Munda, Kahria, Ho and Oraons are some of the major Tribe of the Jharkhand state. Jharkhand has a Rich Cultural heritage which includes Folk art, folklore, Folk Dance, Folk Music, Folk Songs for which diversity can been seen from place to place, season to season and all traditional Culture and practices. The languages as well as the songs are vulnerable to dominant culture and hence needed to be protected. The collection and documentation of these songs in their natural setting adds significant contribution to the conservation and propagation of the cultural elements. This paper reflects to bring out the Originality of the Collected Songs from remote areas of the plateau of Sothern Jharkhand as a rich intangible Cultural heritage of the Country. The research was done through participatory observation. In this research project more than 100 songs which were never documented before.

Simulation of Utility Accrual Scheduling and Recovery Algorithm in Multiprocessor Environment

This paper presents the development of an event based Discrete Event Simulation (DES) for a recovery algorithm known Backward Recovery Global Preemptive Utility Accrual Scheduling (BR_GPUAS). This algorithm implements the Backward Recovery (BR) mechanism as a fault recovery solution under the existing Time/Utility Function/ Utility Accrual (TUF/UA) scheduling domain for multiprocessor environment. The BR mechanism attempts to take the faulty tasks back to its initial safe state and then proceeds to re-execute the affected section of the faulty tasks to enable recovery. Considering that faults may occur in the components of any system; a fault tolerance system that can nullify the erroneous effect is necessary to be developed. Current TUF/UA scheduling algorithm uses the abortion recovery mechanism and it simply aborts the erroneous task as their fault recovery solution. None of the existing algorithm in TUF/UA scheduling domain in multiprocessor scheduling environment have considered the transient fault and implement the BR mechanism as a fault recovery mechanism to nullify the erroneous effect and solve the recovery problem in this domain. The developed BR_GPUAS simulator has derived the set of parameter, events and performance metrics according to a detailed analysis of the base model. Simulation results revealed that BR_GPUAS algorithm can saved almost 20-30% of the accumulated utilities making it reliable and efficient for the real-time application in the multiprocessor scheduling environment.

Assessment of Urban Heat Island through Remote Sensing in Nagpur Urban Area Using Landsat 7 ETM+ Satellite Images

Urban Heat Island (UHI) is found more pronounced as a prominent urban environmental concern in developing cities. To study the UHI effect in the Indian context, the Nagpur urban area has been explored in this paper using Landsat 7 ETM+ satellite images through Remote Sensing and GIS techniques. This paper intends to study the effect of LU/LC pattern on daytime Land Surface Temperature (LST) variation, contributing UHI formation within the Nagpur Urban area. Supervised LU/LC area classification was carried to study urban Change detection using ENVI 5. Change detection has been studied by carrying Normalized Difference Vegetation Index (NDVI) to understand the proportion of vegetative cover with respect to built-up ratio. Detection of spectral radiance from the thermal band of satellite images was processed to calibrate LST. Specific representative areas on the basis of urban built-up and vegetation classification were selected for observation of point LST. The entire Nagpur urban area shows that, as building density increases with decrease in vegetation cover, LST increases, thereby causing the UHI effect. UHI intensity has gradually increased by 0.7°C from 2000 to 2006; however, a drastic increase has been observed with difference of 1.8°C during the period 2006 to 2013. Within the Nagpur urban area, the UHI effect was formed due to increase in building density and decrease in vegetative cover.

Parallel 2-Opt Local Search on GPU

To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.

Production of Pig Iron by Smelting of Blended Pre-Reduced Titaniferous Magnetite Ore and Hematite Ore Using Lean Grade Coal

The rapid depletion of high-grade iron ore (Fe2O3) has gained attention on the use of other sources of iron ore. Titaniferous magnetite ore (TMO) is a special type of magnetite ore having high titania content (23.23% TiO2 present in this case). Due to high TiO2 content and high density, TMO cannot be treated by the conventional smelting reduction. In this present work, the TMO has been collected from high-grade metamorphic terrain of the Precambrian Chotanagpur gneissic complex situated in the eastern part of India (Shaltora area, Bankura district, West Bengal) and the hematite ore has been collected from Visakhapatnam Steel Plant (VSP), Visakhapatnam. At VSP, iron ore is received from Bailadila mines, Chattisgarh of M/s. National Mineral Development Corporation. The preliminary characterization of TMO and hematite ore (HMO) has been investigated by WDXRF, XRD and FESEM analyses. Similarly, good quality of coal (mainly coking coal) is also getting depleted fast. The basic purpose of this work is to find how lean grade coal can be utilised along with TMO for smelting to produce pig iron. Lean grade coal has been characterised by using TG/DTA, proximate and ultimate analyses. The boiler grade coal has been found to contain 28.08% of fixed carbon and 28.31% of volatile matter. TMO fines (below 75 μm) and HMO fines (below 75 μm) have been separately agglomerated with lean grade coal fines (below 75 μm) in the form of briquettes using binders like bentonite and molasses. These green briquettes are dried first in oven at 423 K for 30 min and then reduced isothermally in tube furnace over the temperature range of 1323 K, 1373 K and 1423 K for 30 min & 60 min. After reduction, the reduced briquettes are characterized by XRD and FESEM analyses. The best reduced TMO and HMO samples are taken and blended in three different weight percentage ratios of 1:4, 1:8 and 1:12 of TMO:HMO. The chemical analysis of three blended samples is carried out and degree of metallisation of iron is found to contain 89.38%, 92.12% and 93.12%, respectively. These three blended samples are briquetted using binder like bentonite and lime. Thereafter these blended briquettes are separately smelted in raising hearth furnace at 1773 K for 30 min. The pig iron formed is characterized using XRD, microscopic analysis. It can be concluded that 90% yield of pig iron can be achieved when the blend ratio of TMO:HMO is 1:4.5. This means for 90% yield, the maximum TMO that could be used in the blend is about 18%.

General Purpose Graphic Processing Units Based Real Time Video Tracking System

Real Time Video Tracking is a challenging task for computing professionals. The performance of video tracking techniques is greatly affected by background detection and elimination process. Local regions of the image frame contain vital information of background and foreground. However, pixel-level processing of local regions consumes a good amount of computational time and memory space by traditional approaches. In our approach we have explored the concurrent computational ability of General Purpose Graphic Processing Units (GPGPU) to address this problem. The Gaussian Mixture Model (GMM) with adaptive weighted kernels is used for detecting the background. The weights of the kernel are influenced by local regions and are updated by inter-frame variations of these corresponding regions. The proposed system has been tested with GPU devices such as GeForce GTX 280, GeForce GTX 280 and Quadro K2000. The results are encouraging with maximum speed up 10X compared to sequential approach.

GPU-Accelerated Triangle Mesh Simplification Using Parallel Vertex Removal

We present an approach to triangle mesh simplification designed to be executed on the GPU. We use a quadric error metric to calculate an error value for each vertex of the mesh and order all vertices based on this value. This step is followed by the parallel removal of a number of vertices with the lowest calculated error values. To allow for the parallel removal of multiple vertices we use a set of per-vertex boundaries that prevent mesh foldovers even when simplification operations are performed on neighbouring vertices. We execute multiple iterations of the calculation of the vertex errors, ordering of the error values and removal of vertices until either a desired number of vertices remains in the mesh or a minimum error value is reached. This parallel approach is used to speed up the simplification process while maintaining mesh topology and avoiding foldovers at every step of the simplification.

Password Cracking on Graphics Processing Unit Based Systems

Password authentication is one of the widely used methods to achieve authentication for legal users of computers and defense against attackers. There are many different ways to authenticate users of a system and there are many password cracking methods also developed. This paper proposes how best password cracking can be performed on a CPU-GPGPU based system. The main objective of this work is to project how quickly a password can be cracked with some knowledge about the computer security and password cracking if sufficient security is not incorporated to the system.

Detecting the Edge of Multiple Images in Parallel

Edge is variation of brightness in an image. Edge detection is useful in many application areas such as finding forests, rivers from a satellite image, detecting broken bone in a medical image etc. The paper discusses about finding edge of multiple aerial images in parallel. The proposed work tested on 38 images 37 colored and one monochrome image. The time taken to process N images in parallel is equivalent to time taken to process 1 image in sequential. Message Passing Interface (MPI) and Open Computing Language (OpenCL) is used to achieve task and pixel level parallelism respectively.

A Parallel Approach for 3D-Variational Data Assimilation on GPUs in Ocean Circulation Models

This work is the first dowel in a rather wide research activity in collaboration with Euro Mediterranean Center for Climate Changes, aimed at introducing scalable approaches in Ocean Circulation Models. We discuss designing and implementation of a parallel algorithm for solving the Variational Data Assimilation (DA) problem on Graphics Processing Units (GPUs). The algorithm is based on the fully scalable 3DVar DA model, previously proposed by the authors, which uses a Domain Decomposition approach (we refer to this model as the DD-DA model). We proceed with an incremental porting process consisting of 3 distinct stages: requirements and source code analysis, incremental development of CUDA kernels, testing and optimization. Experiments confirm the theoretic performance analysis based on the so-called scale up factor demonstrating that the DD-DA model can be suitably mapped on GPU architectures.

Novel GPU Approach in Predicting the Directional Trend of the S&P 500

Our goal is development of an algorithm capable of predicting the directional trend of the Standard and Poor’s 500 index (S&P 500). Extensive research has been published attempting to predict different financial markets using historical data testing on an in-sample and trend basis, with many authors employing excessively complex mathematical techniques. In reviewing and evaluating these in-sample methodologies, it became evident that this approach was unable to achieve sufficiently reliable prediction performance for commercial exploitation. For these reasons, we moved to an out-ofsample strategy based on linear regression analysis of an extensive set of financial data correlated with historical closing prices of the S&P 500. We are pleased to report a directional trend accuracy of greater than 55% for tomorrow (t+1) in predicting the S&P 500.

A New Computational Tool for Noise Prediction of Rotating Surfaces (FACT)

The air transport impact on environment is more than ever a limitative obstacle to the aeronautical industry continuous growth. Over the last decades, considerable effort has been carried out in order to obtain quieter aircraft solutions, whether by changing the original design or investigating more silent maneuvers. The noise propagated by rotating surfaces is one of the most important sources of annoyance, being present in most aerial vehicles. Bearing this is mind, CEIIA developed a new computational chain for noise prediction with in-house software tools to obtain solutions in relatively short time without using excessive computer resources. This work is based on the new acoustic tool, which aims to predict the rotor noise generated during steady and maneuvering flight, making use of the flexibility of the C language and the advantages of GPU programming in terms of velocity. The acoustic tool is based in the Formulation 1A of Farassat, capable of predicting two important types of noise: the loading and thickness noise. The present work describes the most important features of the acoustic tool, presenting its most relevant results and framework analyses for helicopters and UAV quadrotors.

Real-Time Visualization Using GPU-Accelerated Filtering of LiDAR Data

This paper presents a real-time visualization technique and filtering of classified LiDAR point clouds. The visualization is capable of displaying filtered information organized in layers by the classification attribute saved within LiDAR datasets. We explain the used data structure and data management, which enables real-time presentation of layered LiDAR data. Real-time visualization is achieved with LOD optimization based on the distance from the observer without loss of quality. The filtering process is done in two steps and is entirely executed on the GPU and implemented using programmable shaders.