Abstract: The quality of press-fit assembly is closely related to
reliability and safety of product. The paper proposed a keypoint
detection method based on convolutional neural network to improve
the accuracy of keypoint detection in press-fit curve. It would
provide an auxiliary basis for judging quality of press-fit assembly.
The press-fit curve is a curve of press-fit force and displacement.
Both force data and distance data are time-series data. Therefore,
one-dimensional convolutional neural network is used to process
the press-fit curve. After the obtained press-fit data is filtered, the
multi-layer one-dimensional convolutional neural network is used to
perform the automatic learning of press-fit curve features, and then
sent to the multi-layer perceptron to finally output keypoint of the
curve. We used the data of press-fit assembly equipment in the actual
production process to train CNN model, and we used different data
from the same equipment to evaluate the performance of detection.
Compared with the existing research result, the performance of
detection was significantly improved. This method can provide a
reliable basis for the judgment of press-fit quality.
Abstract: Studies estimate that there will be 266,120 new cases
of invasive breast cancer and 40,920 breast cancer induced deaths
in the year of 2018 alone. Despite the pervasiveness of this
affliction, the current process to obtain an accurate breast cancer
prognosis is tedious and time consuming. It usually requires a
trained pathologist to manually examine histopathological images and
identify the features that characterize various cancer severity levels.
We propose MITOS-RCNN: a region based convolutional neural
network (RCNN) geared for small object detection to accurately
grade one of the three factors that characterize tumor belligerence
described by the Nottingham Grading System: mitotic count. Other
computational approaches to mitotic figure counting and detection
do not demonstrate ample recall or precision to be clinically viable.
Our models outperformed all previous participants in the ICPR 2012
challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14
challenge along with recently published works. Our model achieved
an F- measure score of 0.955, a 6.11% improvement in accuracy from
the most accurate of the previously proposed models.
Abstract: In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.
Abstract: This paper presents a deep-learning mechanism for classifying computer generated images and photographic images. The proposed method accounts for a convolutional layer capable of automatically learning correlation between neighbouring pixels. In the current form, Convolutional Neural Network (CNN) will learn features based on an image's content instead of the structural features of the image. The layer is particularly designed to subdue an image's content and robustly learn the sensor pattern noise features (usually inherited from image processing in a camera) as well as the statistical properties of images. The paper was assessed on latest natural and computer generated images, and it was concluded that it performs better than the current state of the art methods.
Abstract: This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.
Abstract: This paper presents an automatic normal and abnormal heart sound classification model developed based on deep learning algorithm. MITHSDB heart sounds datasets obtained from the 2016 PhysioNet/Computing in Cardiology Challenge database were used in this research with the assumption that the electrocardiograms (ECG) were recorded simultaneously with the heart sounds (phonocardiogram, PCG). The PCG time series are segmented per heart beat, and each sub-segment is converted to form a square intensity matrix, and classified using convolutional neural network (CNN) models. This approach removes the need to provide classification features for the supervised machine learning algorithm. Instead, the features are determined automatically through training, from the time series provided. The result proves that the prediction model is able to provide reasonable and comparable classification accuracy despite simple implementation. This approach can be used for real-time classification of heart sounds in Internet of Medical Things (IoMT), e.g. remote monitoring applications of PCG signal.
Abstract: In order to study the impact of various factors on the housing price, we propose to build different prediction models based on deep learning to determine the existing data of the real estate in order to more accurately predict the housing price or its changing trend in the future. Considering that the factors which affect the housing price vary widely, the proposed prediction models include two categories. The first one is based on multiple characteristic factors of the real estate. We built Convolution Neural Network (CNN) prediction model and Long Short-Term Memory (LSTM) neural network prediction model based on deep learning, and logical regression model was implemented to make a comparison between these three models. Another prediction model is time series model. Based on deep learning, we proposed an LSTM-1 model purely regard to time series, then implementing and comparing the LSTM model and the Auto-Regressive and Moving Average (ARMA) model. In this paper, comprehensive study of the second-hand housing price in Beijing has been conducted from three aspects: crawling and analyzing, housing price predicting, and the result comparing. Ultimately the best model program was produced, which is of great significance to evaluation and prediction of the housing price in the real estate industry.
Abstract: In seismic data processing, attenuation of random noise
is the basic step to improve quality of data for further application
of seismic data in exploration and development in different gas
and oil industries. The signal-to-noise ratio of the data also highly
determines quality of seismic data. This factor affects the reliability
as well as the accuracy of seismic signal during interpretation
for different purposes in different companies. To use seismic data
for further application and interpretation, we need to improve the
signal-to-noise ration while attenuating random noise effectively.
To improve the signal-to-noise ration and attenuating seismic
random noise by preserving important features and information
about seismic signals, we introduce the concept of anisotropic
total fractional order denoising algorithm. The anisotropic total
fractional order variation model defined in fractional order bounded
variation is proposed as a regularization in seismic denoising. The
split Bregman algorithm is employed to solve the minimization
problem of the anisotropic total fractional order variation model
and the corresponding denoising algorithm for the proposed method
is derived. We test the effectiveness of theproposed method for
synthetic and real seismic data sets and the denoised result is
compared with F-X deconvolution and non-local means denoising
algorithm.
Abstract: This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.
Abstract: Deep Venous Thrombosis (DVT) occurs when a
thrombus is formed within a deep vein (most often in the legs). This
disease can be deadly if a part or the whole thrombus reaches the
lung and causes a Pulmonary Embolism (PE). This disorder, often
asymptomatic, has multifactorial causes: immobilization, surgery,
pregnancy, age, cancers, and genetic variations. Our project aims to
relate the thrombus epidemiology (origins, patient predispositions,
PE) to its structure using ultrasound images. Ultrasonography and
elastography were collected using Toshiba Aplio 500 at Brest
Hospital. This manuscript compares two classification approaches:
spectral clustering and scattering operator. The former is based on
the graph and matrix theories while the latter cascades wavelet
convolutions with nonlinear modulus and averaging operators.
Abstract: Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.
Abstract: Convolutional Neural Networks (CNN) have
demonstrated their effectiveness in synthesizing 3D views of object
instances at various viewpoints. Given the problem where one
have limited viewpoints of a particular object for classification, we
present a pose normalization architecture to transform the object to
existing viewpoints in the training dataset before classification to
yield better classification performance. We have demonstrated that
this Pose Normalization Network (PNN) can capture the style of
the target object and is able to re-render it to a desired viewpoint.
Moreover, we have shown that the PNN improves the classification
result for the 3D chairs dataset and ShapeNet airplanes dataset
when given only images at limited viewpoint, as compared to a
CNN baseline.
Abstract: Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.
Abstract: Deformable part models achieve high precision in
pedestrian recognition, but all publicly available implementations are
too slow for real-time applications. We implemented a deformable
part model algorithm fast enough for real-time use by exploiting
information about the camera position and orientation. This
implementation is both faster and more precise than alternative
DPM implementations. These results are obtained by computing
convolutions in the frequency domain and using lookup tables to
speed up feature computation. This approach is almost an order of
magnitude faster than the reference DPM implementation, with no
loss in precision. Knowing the position of the camera with respect to
horizon it is also possible prune many hypotheses based on their
size and location. The range of acceptable sizes and positions is
set by looking at the statistical distribution of bounding boxes in
labelled images. With this approach it is not needed to compute the
entire feature pyramid: for example higher resolution features are
only needed near the horizon. This results in an increase in mean
average precision of 5% and an increase in speed by a factor of
two. Furthermore, to reduce misdetections involving small pedestrians
near the horizon, input images are supersampled near the horizon.
Supersampling the image at 1.5 times the original scale, results in
an increase in precision of about 4%. The implementation was tested
against the public KITTI dataset, obtaining an 8% improvement in
mean average precision over the best performing DPM-based method.
By allowing for a small loss in precision computational time can be
easily brought down to our target of 100ms per image, reaching a
solution that is faster and still more precise than all publicly available
DPM implementations.
Abstract: In this paper, we investigate certain spaces of
generalized functions for the Fourier and Fourier type integral
transforms. We discuss convolution theorems and establish certain
spaces of distributions for the considered integrals. The new Fourier
type integral is well-defined, linear, one-to-one and continuous with
respect to certain types of convergences. Many properties and an
inverse problem are also discussed in some details.
Abstract: Satellite imagery classification is a challenging problem with many practical applications. In this paper, we designed a deep convolution neural network (DCNN) to classify the satellite imagery. The contributions of this paper are twofold — First, to cope with the large-scale variance in the satellite image, we introduced the inception module, which has multiple filters with different size at the same level, as the building block to build our DCNN model. Second, we proposed a genetic algorithm based method to efficiently search the best hyper-parameters of the DCNN in a large search space. The proposed method is evaluated on the benchmark database. The results of the proposed hyper-parameters search method show it will guide the search towards better regions of the parameter space. Based on the found hyper-parameters, we built our DCNN models, and evaluated its performance on satellite imagery classification, the results show the classification accuracy of proposed models outperform the state of the art method.
Abstract: Recent developments on multi-agent system have brought a new research field on image processing. Several algorithms are used simultaneously and improved in deferent applications while new methods are investigated. This paper presents a new automatic method for edge detection using several agents and many different actions. The proposed multi-agent system is based on parallel agents that locally perceive their environment, that is to say, pixels and additional environmental information. This environment is built using Vector Field Convolution that attract free agent to the edges. Problems of partial, hidden or edges linking are solved with the cooperation between agents. The presented method was implemented and evaluated using several examples on different synthetic and medical images. The obtained experimental results suggest that this approach confirm the efficiency and accuracy of detected edge.
Abstract: This study presents a conformational model of the helical structures of globular protein particularly ferritin in the framework of white noise path integral formulation by using Associated Legendre functions, Bessel and convolution of Bessel and trigonometric functions as modulating functions. The model incorporates chirality features of proteins and their helix-turn-helix sequence structural motif.
Abstract: NiFe2O4 (nickel ferrite), ZnFe2O4 (zinc ferrite) and
Ni0.5Zn0.5Fe2O4 (nickel-zinc ferrite) were prepared by
mechanochemical route in a planetary ball mill starting from mixture
of the appropriate quantities of the Ni(OH)2/Fe(OH)3,
Zn(OH)2/Fe(OH)3 and Ni(OH)2/Zn(OH)2/Fe(OH)3 hydroxide
powders. In order to monitor the progress of chemical reaction and
confirm phase formation, powder samples obtained after 25 h, 18 h
and 10 h of milling were characterized by X-ray diffraction (XRD),
transmission electron microscopy (TEM), IR, Raman and Mössbauer
spectroscopy. It is shown that the soft mechanochemical method, i.e.
mechanochemical activation of hydroxides, produces high quality
single phase ferrite samples in much more efficient way. From the IR
spectroscopy of single phase samples it is obvious that energy of
modes depends on the ratio of cations. It is obvious that all samples
have more than 5 Raman active modes predicted by group theory in
the normal spinel structure. Deconvolution of measured spectra
allows one to conclude that all complex bands in the spectra are made
of individual peaks with the intensities that vary from spectrum to
spectrum. The deconvolution of Raman spectra allows to separate
contributions of different cations to a particular type of vibration and
to estimate the degree of inversion.
Abstract: Any signal transmitted over a channel is corrupted by noise and interference. A host of channel coding techniques has been proposed to alleviate the effect of such noise and interference. Among these Turbo codes are recommended, because of increased capacity at higher transmission rates and superior performance over convolutional codes. The multimedia elements which are associated with ample amount of data are best protected by Turbo codes. Turbo decoder employs Maximum A-posteriori Probability (MAP) and Soft Output Viterbi Decoding (SOVA) algorithms. Conventional Turbo coded systems employ Equal Error Protection (EEP) in which the protection of all the data in an information message is uniform. Some applications involve Unequal Error Protection (UEP) in which the level of protection is higher for important information bits than that of other bits. In this work, enhancement to the traditional Log MAP decoding algorithm is being done by using optimized scaling factors for both the decoders. The error correcting performance in presence of UEP in Additive White Gaussian Noise channel (AWGN) and Rayleigh fading are analyzed for the transmission of image with Discrete Cosine Transform (DCT) as source coding technique. This paper compares the performance of log MAP, Modified log MAP (MlogMAP) and Enhanced log MAP (ElogMAP) algorithms used for image transmission. The MlogMAP algorithm is found to be best for lower Eb/N0 values but for higher Eb/N0 ElogMAP performs better with optimized scaling factors. The performance comparison of AWGN with fading channel indicates the robustness of the proposed algorithm. According to the performance of three different message classes, class3 would be more protected than other two classes. From the performance analysis, it is observed that ElogMAP algorithm with UEP is best for transmission of an image compared to Log MAP and MlogMAP decoding algorithms.