Abstract: Yi is an ethnic group mainly living in mainland China, with its own spoken and written language systems, after development of thousands of years. Ancient Yi is one of the six ancient languages in the world, which keeps a record of the history of the Yi people and offers documents valuable for research into human civilization. Recognition of the characters in ancient Yi helps to transform the documents into an electronic form, making their storage and spreading convenient. Due to historical and regional limitations, research on recognition of ancient characters is still inadequate. Thus, deep learning technology was applied to the recognition of such characters. Five models were developed on the basis of the four-layer convolutional neural network (CNN). Alpha-Beta divergence was taken as a penalty term to re-encode output neurons of the five models. Two fully connected layers fulfilled the compression of the features. Finally, at the softmax layer, the orthographic features of ancient Yi characters were re-evaluated, their probability distributions were obtained, and characters with features of the highest probability were recognized. Tests conducted show that the method has achieved higher precision compared with the traditional CNN model for handwriting recognition of the ancient Yi.
Abstract: 2016 has become the year of the Artificial Intelligence explosion. AI technologies are getting more and more matured that most world well-known tech giants are making large investment to increase the capabilities in AI. Machine learning is the science of getting computers to act without being explicitly programmed, and deep learning is a subset of machine learning that uses deep neural network to train a machine to learn features directly from data. Deep learning realizes many machine learning applications which expand the field of AI. At the present time, deep learning frameworks have been widely deployed on servers for deep learning applications in both academia and industry. In training deep neural networks, there are many standard processes or algorithms, but the performance of different frameworks might be different. In this paper we evaluate the running performance of two state-of-the-art distributed deep learning frameworks that are running training calculation in parallel over multi GPU and multi nodes in our cloud environment. We evaluate the training performance of the frameworks with ResNet-50 convolutional neural network, and we analyze what factors that result in the performance among both distributed frameworks as well. Through the experimental analysis, we identify the overheads which could be further optimized. The main contribution is that the evaluation results provide further optimization directions in both performance tuning and algorithmic design.
Abstract: Tracking sports players is a widely challenging
scenario, specially in single-feed videos recorded in tight courts,
where cluttering and occlusions cannot be avoided. This paper
presents an analysis of several geometric and semantic visual features
to detect and track basketball players. An ablation study is carried
out and then used to remark that a robust tracker can be built with
Deep Learning features, without the need of extracting contextual
ones, such as proximity or color similarity, nor applying camera
stabilization techniques. The presented tracker consists of: (1) a
detection step, which uses a pretrained deep learning model to
estimate the players pose, followed by (2) a tracking step, which
leverages pose and semantic information from the output of a
convolutional layer in a VGG network. Its performance is analyzed
in terms of MOTA over a basketball dataset with more than 10k
Abstract: The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.
Abstract: In data-driven prognostic methods, the prediction
accuracy of the estimation for remaining useful life of bearings
mainly depends on the performance of health indicators, which
are usually fused some statistical features extracted from vibrating
signals. However, the existing health indicators have the following
two drawbacks: (1) The differnet ranges of the statistical features
have the different contributions to construct the health indicators,
the expert knowledge is required to extract the features. (2) When
convolutional neural networks are utilized to tackle time-frequency
features of signals, the time-series of signals are not considered.
To overcome these drawbacks, in this study, the method combining
convolutional neural network with gated recurrent unit is proposed to
extract the time-frequency image features. The extracted features are
utilized to construct health indicator and predict remaining useful life
of bearings. First, original signals are converted into time-frequency
images by using continuous wavelet transform so as to form the
original feature sets. Second, with convolutional and pooling layers
of convolutional neural networks, the most sensitive features of
time-frequency images are selected from the original feature sets.
Finally, these selected features are fed into the gated recurrent unit
to construct the health indicator. The results state that the proposed
method shows the enhance performance than the related studies which
have used the same bearing dataset provided by PRONOSTIA.
Abstract: Due to the sensor technology, video surveillance has become the main way for security control in every big city in the world. Surveillance is usually used by governments for intelligence gathering, the prevention of crime, the protection of a process, person, group or object, or the investigation of crime. Many surveillance systems based on computer vision technology have been developed in recent years. Moving target tracking is the most common task for Unmanned Aerial Vehicle (UAV) to find and track objects of interest in mobile aerial surveillance for civilian applications. The paper is focused on vision-based collision avoidance for UAVs by recurrent neural networks. First, images from cameras on UAV were fused based on deep convolutional neural network. Then, a recurrent neural network was constructed to obtain high-level image features for object tracking and extracting low-level image features for noise reducing. The system distributed the calculation of the whole system to local and cloud platform to efficiently perform object detection, tracking and collision avoidance based on multiple UAVs. The experiments on several challenging datasets showed that the proposed algorithm outperforms the state-of-the-art methods.
Abstract: The quality of press-fit assembly is closely related to
reliability and safety of product. The paper proposed a keypoint
detection method based on convolutional neural network to improve
the accuracy of keypoint detection in press-fit curve. It would
provide an auxiliary basis for judging quality of press-fit assembly.
The press-fit curve is a curve of press-fit force and displacement.
Both force data and distance data are time-series data. Therefore,
one-dimensional convolutional neural network is used to process
the press-fit curve. After the obtained press-fit data is filtered, the
multi-layer one-dimensional convolutional neural network is used to
perform the automatic learning of press-fit curve features, and then
sent to the multi-layer perceptron to finally output keypoint of the
curve. We used the data of press-fit assembly equipment in the actual
production process to train CNN model, and we used different data
from the same equipment to evaluate the performance of detection.
Compared with the existing research result, the performance of
detection was significantly improved. This method can provide a
reliable basis for the judgment of press-fit quality.
Abstract: Studies estimate that there will be 266,120 new cases
of invasive breast cancer and 40,920 breast cancer induced deaths
in the year of 2018 alone. Despite the pervasiveness of this
affliction, the current process to obtain an accurate breast cancer
prognosis is tedious and time consuming. It usually requires a
trained pathologist to manually examine histopathological images and
identify the features that characterize various cancer severity levels.
We propose MITOS-RCNN: a region based convolutional neural
network (RCNN) geared for small object detection to accurately
grade one of the three factors that characterize tumor belligerence
described by the Nottingham Grading System: mitotic count. Other
computational approaches to mitotic figure counting and detection
do not demonstrate ample recall or precision to be clinically viable.
Our models outperformed all previous participants in the ICPR 2012
challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14
challenge along with recently published works. Our model achieved
an F- measure score of 0.955, a 6.11% improvement in accuracy from
the most accurate of the previously proposed models.
Abstract: In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.
Abstract: This paper presents a deep-learning mechanism for classifying computer generated images and photographic images. The proposed method accounts for a convolutional layer capable of automatically learning correlation between neighbouring pixels. In the current form, Convolutional Neural Network (CNN) will learn features based on an image's content instead of the structural features of the image. The layer is particularly designed to subdue an image's content and robustly learn the sensor pattern noise features (usually inherited from image processing in a camera) as well as the statistical properties of images. The paper was assessed on latest natural and computer generated images, and it was concluded that it performs better than the current state of the art methods.
Abstract: This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.
Abstract: This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.
Abstract: In order to study the impact of various factors on the housing price, we propose to build different prediction models based on deep learning to determine the existing data of the real estate in order to more accurately predict the housing price or its changing trend in the future. Considering that the factors which affect the housing price vary widely, the proposed prediction models include two categories. The first one is based on multiple characteristic factors of the real estate. We built Convolution Neural Network (CNN) prediction model and Long Short-Term Memory (LSTM) neural network prediction model based on deep learning, and logical regression model was implemented to make a comparison between these three models. Another prediction model is time series model. Based on deep learning, we proposed an LSTM-1 model purely regard to time series, then implementing and comparing the LSTM model and the Auto-Regressive and Moving Average (ARMA) model. In this paper, comprehensive study of the second-hand housing price in Beijing has been conducted from three aspects: crawling and analyzing, housing price predicting, and the result comparing. Ultimately the best model program was produced, which is of great significance to evaluation and prediction of the housing price in the real estate industry.
Abstract: In seismic data processing, attenuation of random noise
is the basic step to improve quality of data for further application
of seismic data in exploration and development in different gas
and oil industries. The signal-to-noise ratio of the data also highly
determines quality of seismic data. This factor affects the reliability
as well as the accuracy of seismic signal during interpretation
for different purposes in different companies. To use seismic data
for further application and interpretation, we need to improve the
signal-to-noise ration while attenuating random noise effectively.
To improve the signal-to-noise ration and attenuating seismic
random noise by preserving important features and information
about seismic signals, we introduce the concept of anisotropic
total fractional order denoising algorithm. The anisotropic total
fractional order variation model defined in fractional order bounded
variation is proposed as a regularization in seismic denoising. The
split Bregman algorithm is employed to solve the minimization
problem of the anisotropic total fractional order variation model
and the corresponding denoising algorithm for the proposed method
is derived. We test the effectiveness of theproposed method for
synthetic and real seismic data sets and the denoised result is
compared with F-X deconvolution and non-local means denoising
Abstract: This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.
Abstract: Deep Venous Thrombosis (DVT) occurs when a
thrombus is formed within a deep vein (most often in the legs). This
disease can be deadly if a part or the whole thrombus reaches the
lung and causes a Pulmonary Embolism (PE). This disorder, often
asymptomatic, has multifactorial causes: immobilization, surgery,
pregnancy, age, cancers, and genetic variations. Our project aims to
relate the thrombus epidemiology (origins, patient predispositions,
PE) to its structure using ultrasound images. Ultrasonography and
elastography were collected using Toshiba Aplio 500 at Brest
Hospital. This manuscript compares two classification approaches:
spectral clustering and scattering operator. The former is based on
the graph and matrix theories while the latter cascades wavelet
convolutions with nonlinear modulus and averaging operators.
Abstract: Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.
Abstract: Convolutional Neural Networks (CNN) have
demonstrated their effectiveness in synthesizing 3D views of object
instances at various viewpoints. Given the problem where one
have limited viewpoints of a particular object for classification, we
present a pose normalization architecture to transform the object to
existing viewpoints in the training dataset before classification to
yield better classification performance. We have demonstrated that
this Pose Normalization Network (PNN) can capture the style of
the target object and is able to re-render it to a desired viewpoint.
Moreover, we have shown that the PNN improves the classification
result for the 3D chairs dataset and ShapeNet airplanes dataset
when given only images at limited viewpoint, as compared to a
Abstract: Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.
Abstract: Deformable part models achieve high precision in
pedestrian recognition, but all publicly available implementations are
too slow for real-time applications. We implemented a deformable
part model algorithm fast enough for real-time use by exploiting
information about the camera position and orientation. This
implementation is both faster and more precise than alternative
DPM implementations. These results are obtained by computing
convolutions in the frequency domain and using lookup tables to
speed up feature computation. This approach is almost an order of
magnitude faster than the reference DPM implementation, with no
loss in precision. Knowing the position of the camera with respect to
horizon it is also possible prune many hypotheses based on their
size and location. The range of acceptable sizes and positions is
set by looking at the statistical distribution of bounding boxes in
labelled images. With this approach it is not needed to compute the
entire feature pyramid: for example higher resolution features are
only needed near the horizon. This results in an increase in mean
average precision of 5% and an increase in speed by a factor of
two. Furthermore, to reduce misdetections involving small pedestrians
near the horizon, input images are supersampled near the horizon.
Supersampling the image at 1.5 times the original scale, results in
an increase in precision of about 4%. The implementation was tested
against the public KITTI dataset, obtaining an 8% improvement in
mean average precision over the best performing DPM-based method.
By allowing for a small loss in precision computational time can be
easily brought down to our target of 100ms per image, reaching a
solution that is faster and still more precise than all publicly available