Abstract: In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used in the safety, the security, and the commercial aspects. Forethought, several methods and techniques are computing to achieve the better levels in terms of accuracy and real time execution. This paper proposed a computer vision algorithm of Number Plate Localization (NPL) and Characters Segmentation (CS). In addition, it proposed an improved method in Optical Character Recognition (OCR) based on Deep Learning (DL) techniques. In order to identify the number of detected plate after NPL and CS steps, the Convolutional Neural Network (CNN) algorithm is proposed. A DL model is developed using four convolution layers, two layers of Maxpooling, and six layers of fully connected. The model was trained by number image database on the Jetson TX2 NVIDIA target. The accuracy result has achieved 95.84%.
Abstract: Ancient books are signiﬁcant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufﬁcient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overﬁtting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the ﬁeld make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversiﬁed samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different ﬁne-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientiﬁcally, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.
Abstract: Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.
Abstract: Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.
Abstract: Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.
Abstract: This paper considers the modelling of a non-stationary
bivariate integer-valued autoregressive moving average of order
one (BINARMA(1,1)) with correlated Poisson innovations. The
BINARMA(1,1) model is specified using the binomial thinning
operator and by assuming that the cross-correlation between the
two series is induced by the innovation terms only. Based on
these assumptions, the non-stationary marginal and joint moments
of the BINARMA(1,1) are derived iteratively by using some initial
stationary moments. As regards to the estimation of parameters of
the proposed model, the conditional maximum likelihood (CML)
estimation method is derived based on thinning and convolution
properties. The forecasting equations of the BINARMA(1,1) model
are also derived. A simulation study is also proposed where
BINARMA(1,1) count data are generated using a multivariate
Poisson R code for the innovation terms. The performance of
the BINARMA(1,1) model is then assessed through a simulation
experiment and the mean estimates of the model parameters obtained
are all efficient, based on their standard errors. The proposed model
is then used to analyse a real-life accident data on the motorway in
Mauritius, based on some covariates: policemen, daily patrol, speed
cameras, traffic lights and roundabouts. The BINARMA(1,1) model
is applied on the accident data and the CML estimates clearly indicate
a significant impact of the covariates on the number of accidents on
the motorway in Mauritius. The forecasting equations also provide
reliable one-step ahead forecasts.
Abstract: Yi is an ethnic group mainly living in mainland China, with its own spoken and written language systems, after development of thousands of years. Ancient Yi is one of the six ancient languages in the world, which keeps a record of the history of the Yi people and offers documents valuable for research into human civilization. Recognition of the characters in ancient Yi helps to transform the documents into an electronic form, making their storage and spreading convenient. Due to historical and regional limitations, research on recognition of ancient characters is still inadequate. Thus, deep learning technology was applied to the recognition of such characters. Five models were developed on the basis of the four-layer convolutional neural network (CNN). Alpha-Beta divergence was taken as a penalty term to re-encode output neurons of the five models. Two fully connected layers fulfilled the compression of the features. Finally, at the softmax layer, the orthographic features of ancient Yi characters were re-evaluated, their probability distributions were obtained, and characters with features of the highest probability were recognized. Tests conducted show that the method has achieved higher precision compared with the traditional CNN model for handwriting recognition of the ancient Yi.
Abstract: 2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
Abstract: Tracking sports players is a widely challenging
scenario, specially in single-feed videos recorded in tight courts,
where cluttering and occlusions cannot be avoided. This paper
presents an analysis of several geometric and semantic visual features
to detect and track basketball players. An ablation study is carried
out and then used to remark that a robust tracker can be built with
Deep Learning features, without the need of extracting contextual
ones, such as proximity or color similarity, nor applying camera
stabilization techniques. The presented tracker consists of: (1) a
detection step, which uses a pretrained deep learning model to
estimate the players pose, followed by (2) a tracking step, which
leverages pose and semantic information from the output of a
convolutional layer in a VGG network. Its performance is analyzed
in terms of MOTA over a basketball dataset with more than 10k
Abstract: The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.
Abstract: In data-driven prognostic methods, the prediction
accuracy of the estimation for remaining useful life of bearings
mainly depends on the performance of health indicators, which
are usually fused some statistical features extracted from vibrating
signals. However, the existing health indicators have the following
two drawbacks: (1) The differnet ranges of the statistical features
have the different contributions to construct the health indicators,
the expert knowledge is required to extract the features. (2) When
convolutional neural networks are utilized to tackle time-frequency
features of signals, the time-series of signals are not considered.
To overcome these drawbacks, in this study, the method combining
convolutional neural network with gated recurrent unit is proposed to
extract the time-frequency image features. The extracted features are
utilized to construct health indicator and predict remaining useful life
of bearings. First, original signals are converted into time-frequency
images by using continuous wavelet transform so as to form the
original feature sets. Second, with convolutional and pooling layers
of convolutional neural networks, the most sensitive features of
time-frequency images are selected from the original feature sets.
Finally, these selected features are fed into the gated recurrent unit
to construct the health indicator. The results state that the proposed
method shows the enhance performance than the related studies which
have used the same bearing dataset provided by PRONOSTIA.
Abstract: Due to the sensor technology, video surveillance has become the main way for security control in every big city in the world. Surveillance is usually used by governments for intelligence gathering, the prevention of crime, the protection of a process, person, group or object, or the investigation of crime. Many surveillance systems based on computer vision technology have been developed in recent years. Moving target tracking is the most common task for Unmanned Aerial Vehicle (UAV) to find and track objects of interest in mobile aerial surveillance for civilian applications. The paper is focused on vision-based collision avoidance for UAVs by recurrent neural networks. First, images from cameras on UAV were fused based on deep convolutional neural network. Then, a recurrent neural network was constructed to obtain high-level image features for object tracking and extracting low-level image features for noise reducing. The system distributed the calculation of the whole system to local and cloud platform to efficiently perform object detection, tracking and collision avoidance based on multiple UAVs. The experiments on several challenging datasets showed that the proposed algorithm outperforms the state-of-the-art methods.
Abstract: The quality of press-fit assembly is closely related to
reliability and safety of product. The paper proposed a keypoint
detection method based on convolutional neural network to improve
the accuracy of keypoint detection in press-fit curve. It would
provide an auxiliary basis for judging quality of press-fit assembly.
The press-fit curve is a curve of press-fit force and displacement.
Both force data and distance data are time-series data. Therefore,
one-dimensional convolutional neural network is used to process
the press-fit curve. After the obtained press-fit data is filtered, the
multi-layer one-dimensional convolutional neural network is used to
perform the automatic learning of press-fit curve features, and then
sent to the multi-layer perceptron to finally output keypoint of the
curve. We used the data of press-fit assembly equipment in the actual
production process to train CNN model, and we used different data
from the same equipment to evaluate the performance of detection.
Compared with the existing research result, the performance of
detection was significantly improved. This method can provide a
reliable basis for the judgment of press-fit quality.
Abstract: Studies estimate that there will be 266,120 new cases
of invasive breast cancer and 40,920 breast cancer induced deaths
in the year of 2018 alone. Despite the pervasiveness of this
affliction, the current process to obtain an accurate breast cancer
prognosis is tedious and time consuming. It usually requires a
trained pathologist to manually examine histopathological images and
identify the features that characterize various cancer severity levels.
We propose MITOS-RCNN: a region based convolutional neural
network (RCNN) geared for small object detection to accurately
grade one of the three factors that characterize tumor belligerence
described by the Nottingham Grading System: mitotic count. Other
computational approaches to mitotic figure counting and detection
do not demonstrate ample recall or precision to be clinically viable.
Our models outperformed all previous participants in the ICPR 2012
challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14
challenge along with recently published works. Our model achieved
an F- measure score of 0.955, a 6.11% improvement in accuracy from
the most accurate of the previously proposed models.
Abstract: In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.
Abstract: This paper presents a deep-learning mechanism for classifying computer generated images and photographic images. The proposed method accounts for a convolutional layer capable of automatically learning correlation between neighbouring pixels. In the current form, Convolutional Neural Network (CNN) will learn features based on an image's content instead of the structural features of the image. The layer is particularly designed to subdue an image's content and robustly learn the sensor pattern noise features (usually inherited from image processing in a camera) as well as the statistical properties of images. The paper was assessed on latest natural and computer generated images, and it was concluded that it performs better than the current state of the art methods.
Abstract: This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.
Abstract: This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.
Abstract: In order to study the impact of various factors on the housing price, we propose to build different prediction models based on deep learning to determine the existing data of the real estate in order to more accurately predict the housing price or its changing trend in the future. Considering that the factors which affect the housing price vary widely, the proposed prediction models include two categories. The first one is based on multiple characteristic factors of the real estate. We built Convolution Neural Network (CNN) prediction model and Long Short-Term Memory (LSTM) neural network prediction model based on deep learning, and logical regression model was implemented to make a comparison between these three models. Another prediction model is time series model. Based on deep learning, we proposed an LSTM-1 model purely regard to time series, then implementing and comparing the LSTM model and the Auto-Regressive and Moving Average (ARMA) model. In this paper, comprehensive study of the second-hand housing price in Beijing has been conducted from three aspects: crawling and analyzing, housing price predicting, and the result comparing. Ultimately the best model program was produced, which is of great significance to evaluation and prediction of the housing price in the real estate industry.
Abstract: In seismic data processing, attenuation of random noise
is the basic step to improve quality of data for further application
of seismic data in exploration and development in different gas
and oil industries. The signal-to-noise ratio of the data also highly
determines quality of seismic data. This factor affects the reliability
as well as the accuracy of seismic signal during interpretation
for different purposes in different companies. To use seismic data
for further application and interpretation, we need to improve the
signal-to-noise ration while attenuating random noise effectively.
To improve the signal-to-noise ration and attenuating seismic
random noise by preserving important features and information
about seismic signals, we introduce the concept of anisotropic
total fractional order denoising algorithm. The anisotropic total
fractional order variation model defined in fractional order bounded
variation is proposed as a regularization in seismic denoising. The
split Bregman algorithm is employed to solve the minimization
problem of the anisotropic total fractional order variation model
and the corresponding denoising algorithm for the proposed method
is derived. We test the effectiveness of theproposed method for
synthetic and real seismic data sets and the denoised result is
compared with F-X deconvolution and non-local means denoising