6D Posture Estimation of Road Vehicles from Color Images

Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.

A Convolutional Deep Neural Network Approach for Skin Cancer Detection Using Skin Lesion Images

Malignant Melanoma, known simply as Melanoma, is a type of skin cancer that appears as a mole on the skin. It is critical to detect this cancer at an early stage because it can spread across the body and may lead to the patient death. When detected early, Melanoma is curable. In this paper we propose a deep learning model (Convolutional Neural Networks) in order to automatically classify skin lesion images as Malignant or Benign. Images underwent certain pre-processing steps to diminish the effect of the normal skin region on the model. The result of the proposed model showed a significant improvement over previous work, achieving an accuracy of 97%.

Adaptive Few-Shot Deep Metric Learning

Currently the most prevalent deep learning methods require a large amount of data for training, whereas few-shot learning tries to learn a model from limited data without extensive retraining. In this paper, we present a loss function based on triplet loss for solving few-shot problem using metric based learning. Instead of setting the margin distance in triplet loss as a constant number empirically, we propose an adaptive margin distance strategy to obtain the appropriate margin distance automatically. We implement the strategy in the deep siamese network for deep metric embedding, by utilizing an optimization approach by penalizing the worst case and rewarding the best. Our experiments on image recognition and co-segmentation model demonstrate that using our proposed triplet loss with adaptive margin distance can significantly improve the performance.

Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.

Towards End-To-End Disease Prediction from Raw Metagenomic Data

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

A Comparison of YOLO Family for Apple Detection and Counting in Orchards

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.

Improving the Performance of Deep Learning in Facial Emotion Recognition with Image Sharpening

We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.

Facial Emotion Recognition with Convolutional Neural Network Based Architecture

Neural networks are appealing for many applications since they are able to learn complex non-linear relationships between input and output data. As the number of neurons and layers in a neural network increase, it is possible to represent more complex relationships with automatically extracted features. Nowadays Deep Neural Networks (DNNs) are widely used in Computer Vision problems such as; classification, object detection, segmentation image editing etc. In this work, Facial Emotion Recognition task is performed by proposed Convolutional Neural Network (CNN)-based DNN architecture using FER2013 Dataset. Moreover, the effects of different hyperparameters (activation function, kernel size, initializer, batch size and network size) are investigated and ablation study results for Pooling Layer, Dropout and Batch Normalization are presented.

Deep Learning Based 6D Pose Estimation for Bin-Picking Using 3D Point Clouds

Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.

Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

A Survey of Field Programmable Gate Array-Based Convolutional Neural Network Accelerators

With the rapid development of deep learning, neural network and deep learning algorithms play a significant role in various practical applications. Due to the high accuracy and good performance, Convolutional Neural Networks (CNNs) especially have become a research hot spot in the past few years. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses a significant challenge to construct a high-performance implementation of deep learning neural networks. Meanwhile, many of these application scenarios also have strict requirements on the performance and low-power consumption of hardware devices. Therefore, it is particularly critical to choose a moderate computing platform for hardware acceleration of CNNs. This article aimed to survey the recent advance in Field Programmable Gate Array (FPGA)-based acceleration of CNNs. Various designs and implementations of the accelerator based on FPGA under different devices and network models are overviewed, and the versions of Graphic Processing Units (GPUs), Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs) are compared to present our own critical analysis and comments. Finally, we give a discussion on different perspectives of these acceleration and optimization methods on FPGA platforms to further explore the opportunities and challenges for future research. More helpfully, we give a prospect for future development of the FPGA-based accelerator.

On Dialogue Systems Based on Deep Learning

Nowadays, dialogue systems increasingly become the way for humans to access many computer systems. So, humans can interact with computers in natural language. A dialogue system consists of three parts: understanding what humans say in natural language, managing dialogue, and generating responses in natural language. In this paper, we survey deep learning based methods for dialogue management, response generation and dialogue evaluation. Specifically, these methods are based on neural network, long short-term memory network, deep reinforcement learning, pre-training and generative adversarial network. We compare these methods and point out the further research directions.

Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

A Survey of Sentiment Analysis Based on Deep Learning

Sentiment analysis is a very active research topic. Every day, Facebook, Twitter, Weibo, and other social media, as well as significant e-commerce websites, generate a massive amount of comments, which can be used to analyse peoples opinions or emotions. The existing methods for sentiment analysis are based mainly on sentiment dictionaries, machine learning, and deep learning. The first two kinds of methods rely on heavily sentiment dictionaries or large amounts of labelled data. The third one overcomes these two problems. So, in this paper, we focus on the third one. Specifically, we survey various sentiment analysis methods based on convolutional neural network, recurrent neural network, long short-term memory, deep neural network, deep belief network, and memory network. We compare their futures, advantages, and disadvantages. Also, we point out the main problems of these methods, which may be worthy of careful studies in the future. Finally, we also examine the application of deep learning in multimodal sentiment analysis and aspect-level sentiment analysis.

A Survey of Response Generation of Dialogue Systems

An essential task in the field of artificial intelligence is to allow computers to interact with people through natural language. Therefore, researches such as virtual assistants and dialogue systems have received widespread attention from industry and academia. The response generation plays a crucial role in dialogue systems, so to push forward the research on this topic, this paper surveys various methods for response generation. We sort out these methods into three categories. First one includes finite state machine methods, framework methods, and instance methods. The second contains full-text indexing methods, ontology methods, vast knowledge base method, and some other methods. The third covers retrieval methods and generative methods. We also discuss some hybrid methods based knowledge and deep learning. We compare their disadvantages and advantages and point out in which ways these studies can be improved further. Our discussion covers some studies published in leading conferences such as IJCAI and AAAI in recent years.

Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Low-Cost Mechatronic Design of an Omnidirectional Mobile Robot

This paper presents the results of a mechatronic design based on a 4-wheel omnidirectional mobile robot that can be used in indoor logistic applications. The low-level control has been selected using two open-source hardware (Raspberry Pi 3 Model B+ and Arduino Mega 2560) that control four industrial motors, four ultrasound sensors, four optical encoders, a vision system of two cameras, and a Hokuyo URG-04LX-UG01 laser scanner. Moreover, the system is powered with a lithium battery that can supply 24 V DC and a maximum current-hour of 20Ah.The Robot Operating System (ROS) has been implemented in the Raspberry Pi and the performance is evaluated with the selection of the sensors and hardware selected. The mechatronic system is evaluated and proposed safe modes of power distribution for controlling all the electronic devices based on different tests. Therefore, based on different performance results, some recommendations are indicated for using the Raspberry Pi and Arduino in terms of power, communication, and distribution of control for different devices. According to these recommendations, the selection of sensors is distributed in both real-time controllers (Arduino and Raspberry Pi). On the other hand, the drivers of the cameras have been implemented in Linux and a python program has been implemented to access the cameras. These cameras will be used for implementing a deep learning algorithm to recognize people and objects. In this way, the level of intelligence can be increased in combination with the maps that can be obtained from the laser scanner.

Churn Prediction for Telecommunication Industry Using Artificial Neural Networks

Telecommunication service providers demand accurate and precise prediction of customer churn probabilities to increase the effectiveness of their customer relation services. The large amount of customer data owned by the service providers is suitable for analysis by machine learning methods. In this study, expenditure data of customers are analyzed by using an artificial neural network (ANN). The ANN model is applied to the data of customers with different billing duration. The proposed model successfully predicts the churn probabilities at 83% accuracy for only three months expenditure data and the prediction accuracy increases up to 89% when the nine month data is used. The experiments also show that the accuracy of ANN model increases on an extended feature set with information of the changes on the bill amounts.

A Deep-Learning Based Prediction of Pancreatic Adenocarcinoma with Electronic Health Records from the State of Maine

Predicting the risk of Pancreatic Adenocarcinoma (PA) in advance can benefit the quality of care and potentially reduce population mortality and morbidity. The aim of this study was to develop and prospectively validate a risk prediction model to identify patients at risk of new incident PA as early as 3 months before the onset of PA in a statewide, general population in Maine. The PA prediction model was developed using Deep Neural Networks, a deep learning algorithm, with a 2-year electronic-health-record (EHR) cohort. Prospective results showed that our model identified 54.35% of all inpatient episodes of PA, and 91.20% of all PA that required subsequent chemoradiotherapy, with a lead-time of up to 3 months and a true alert of 67.62%. The risk assessment tool has attained an improved discriminative ability. It can be immediately deployed to the health system to provide automatic early warnings to adults at risk of PA. It has potential to identify personalized risk factors to facilitate customized PA interventions.

SNR Classification Using Multiple CNNs

Noise estimation is essential in today wireless systems for power control, adaptive modulation, interference suppression and quality of service. Deep learning (DL) has already been applied in the physical layer for modulation and signal classifications. Unacceptably low accuracy of less than 50% is found to undermine traditional application of DL classification for SNR prediction. In this paper, we use divide-and-conquer algorithm and classifier fusion method to simplify SNR classification and therefore enhances DL learning and prediction. Specifically, multiple CNNs are used for classification rather than a single CNN. Each CNN performs a binary classification of a single SNR with two labels: less than, greater than or equal. Together, multiple CNNs are combined to effectively classify over a range of SNR values from −20 ≤ SNR ≤ 32 dB.We use pre-trained CNNs to predict SNR over a wide range of joint channel parameters including multiple Doppler shifts (0, 60, 120 Hz), power-delay profiles, and signal-modulation types (QPSK,16QAM,64-QAM). The approach achieves individual SNR prediction accuracy of 92%, composite accuracy of 70% and prediction convergence one order of magnitude faster than that of traditional estimation.