Abstract: Convolutional Neural Networks (CNN) have
demonstrated their effectiveness in synthesizing 3D views of object
instances at various viewpoints. Given the problem where one
have limited viewpoints of a particular object for classification, we
present a pose normalization architecture to transform the object to
existing viewpoints in the training dataset before classification to
yield better classification performance. We have demonstrated that
this Pose Normalization Network (PNN) can capture the style of
the target object and is able to re-render it to a desired viewpoint.
Moreover, we have shown that the PNN improves the classification
result for the 3D chairs dataset and ShapeNet airplanes dataset
when given only images at limited viewpoint, as compared to a
CNN baseline.
Abstract: Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.
Abstract: To explore how the brain may recognise objects in its
general,accurate and energy-efficient manner, this paper proposes the
use of a neuromorphic hardware system formed from a Dynamic
Video Sensor (DVS) silicon retina in concert with the SpiNNaker
real-time Spiking Neural Network (SNN) simulator. As a first step
in the exploration on this platform a recognition system for dynamic
hand postures is developed, enabling the study of the methods used
in the visual pathways of the brain. Inspired by the behaviours of
the primary visual cortex, Convolutional Neural Networks (CNNs)
are modelled using both linear perceptrons and spiking Leaky
Integrate-and-Fire (LIF) neurons.
In this study’s largest configuration using these approaches, a
network of 74,210 neurons and 15,216,512 synapses is created and
operated in real-time using 290 SpiNNaker processor cores in parallel
and with 93.0% accuracy. A smaller network using only 1/10th of the
resources is also created, again operating in real-time, and it is able
to recognise the postures with an accuracy of around 86.4% - only
6.6% lower than the much larger system. The recognition rate of the
smaller network developed on this neuromorphic system is sufficient
for a successful hand posture recognition system, and demonstrates
a much improved cost to performance trade-off in its approach.