Abstract: Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.
Abstract: Currently the most prevalent deep learning methods require a large amount of data for training, whereas few-shot learning tries to learn a model from limited data without extensive retraining. In this paper, we present a loss function based on triplet loss for solving few-shot problem using metric based learning. Instead of setting the margin distance in triplet loss as a constant number empirically, we propose an adaptive margin distance strategy to obtain the appropriate margin distance automatically. We implement the strategy in the deep siamese network for deep metric embedding, by utilizing an optimization approach by penalizing the worst case and rewarding the best. Our experiments on image recognition and co-segmentation model demonstrate that using our proposed triplet loss with adaptive margin distance can significantly improve the performance.
Abstract: Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.
Abstract: This article contains information from our investigation in the field of voice recognition. For this purpose, we created a voice database that contains different phrases in two languages, English and Spanish, for men and women. As a classifier, the LIRA (Limited Receptive Area) grayscale neural classifier was selected. The LIRA grayscale neural classifier was developed for image recognition tasks and demonstrated good results. Therefore, we decided to develop a recognition system using this classifier for voice recognition. From a specific set of speakers, we can recognize the speaker’s voice. For this purpose, the system uses spectrograms of the voice signals as input to the system, extracts the characteristics and identifies the speaker. The results are described and analyzed in this article. The classifier can be used for speaker identification in security system or smart buildings for different types of intelligent devices.
Abstract: In some applications, such as image recognition or
compression, segmentation refers to the process of partitioning a
digital image into multiple segments. Image segmentation is typically
used to locate objects and boundaries (lines, curves, etc.) in images.
Image segmentation is to classify or cluster an image into several
parts (regions) according to the feature of image, for example, the
pixel value or the frequency response. More precisely, image
segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics. The result of image segmentation is a set of segments
that collectively cover the entire image, or a set of contours extracted
from the image. Several image segmentation algorithms were
proposed to segment an image before recognition or compression. Up
to now, many image segmentation algorithms exist and be
extensively applied in science and daily life. According to their
segmentation method, we can approximately categorize them into
region-based segmentation, data clustering, and edge-base
segmentation. In this paper, we give a study of several popular image
segmentation algorithms that are available.
Abstract: This article presents the developments of efficient
algorithms for tablet copies comparison. Image recognition has
specialized use in digital systems such as medical imaging,
computer vision, defense, communication etc. Comparison between
two images that look indistinguishable is a formidable task. Two
images taken from different sources might look identical but due to
different digitizing properties they are not. Whereas small variation
in image information such as cropping, rotation, and slight
photometric alteration are unsuitable for based matching
techniques. In this paper we introduce different matching
algorithms designed to facilitate, for art centers, identifying real
painting images from fake ones. Different vision algorithms for
local image features are implemented using MATLAB. In this
framework a Table Comparison Computer Tool “TCCT" is
designed to facilitate our research. The TCCT is a Graphical Unit
Interface (GUI) tool used to identify images by its shapes and
objects. Parameter of vision system is fully accessible to user
through this graphical unit interface. And then for matching, it
applies different description technique that can identify exact
figures of objects.
Abstract: Pattern recognition and image recognition methods are commonly developed and tested using testbeds, which contain known responses to a query set. Until now, testbeds available for image analysis and content-based image retrieval (CBIR) have been scarce and small-scale. Here we present the one million images CEA-List Image Collection (CLIC) testbed that we have produced, and report on our use of this testbed to evaluate image analysis merging techniques. This testbed will soon be made publicly available through the EU MUSCLE Network of Excellence.
Abstract: This paper compares Hilditch, Rosenfeld, Zhang-
Suen, dan Nagendraprasad Wang Gupta (NWG) thinning algorithms
for Javanese character image recognition. Thinning is an effective
process when the focus in not on the size of the pattern, but rather on
the relative position of the strokes in the pattern. The research
analyzes the thinning of 60 Javanese characters.
Time-wise, Zhang-Suen algorithm gives the best results with the
average process time being 0.00455188 seconds. But if we look at
the percentage of pixels that meet one-pixel thickness, Rosenfelt
algorithm gives the best results, with a 99.98% success rate. From the
number of pixels that are erased, NWG algorithm gives the best
results with the average number of pixels erased being 84.12%. It can
be concluded that the Hilditch algorithm performs least successfully
compared to the other three algorithms.
Abstract: A complex valued neural network is a neural network, which consists of complex valued input and/or weights and/or thresholds and/or activation functions. Complex-valued neural networks have been widening the scope of applications not only in electronics and informatics, but also in social systems. One of the most important applications of the complex valued neural network is in image and vision processing. In Neural networks, radial basis functions are often used for interpolation in multidimensional space. A Radial Basis function is a function, which has built into it a distance criterion with respect to a centre. Radial basis functions have often been applied in the area of neural networks where they may be used as a replacement for the sigmoid hidden layer transfer characteristic in multi-layer perceptron. This paper aims to present exhaustive results of using RBF units in a complex-valued neural network model that uses the back-propagation algorithm (called 'Complex-BP') for learning. Our experiments results demonstrate the effectiveness of a Radial basis function in a complex valued neural network in image recognition over a real valued neural network. We have studied and stated various observations like effect of learning rates, ranges of the initial weights randomly selected, error functions used and number of iterations for the convergence of error on a neural network model with RBF units. Some inherent properties of this complex back propagation algorithm are also studied and discussed.
Abstract: Image data holds a large amount of different context
information. However, as of today, these resources remain largely
untouched. It is thus the aim of this paper to present a basic technical
framework which allows for a quick and easy exploitation of context
information from image data especially by non-expert users.
Furthermore, the proposed framework is discussed in detail
concerning important social and ethical issues which demand special
requirements in system design. Finally, a first sensor prototype is
presented which meets the identified requirements. Additionally,
necessary implications for the software and hardware design of the
system are discussed, rendering a sensor system which could be
regarded as a good, acceptable and justifiable technical and thereby
enabling the extraction of context information from image data.
Abstract: Traditional object segmentation methods are time consuming and computationally difficult. In this paper, onedimensional object detection along the secant lines is applied. Statistical features of texture images are computed for the recognition process. Example matrices of these features and formulae for calculation of similarities between two feature patterns are expressed. And experiments are also carried out using these features.
Abstract: Needs of an efficient information retrieval in recent
years in increased more then ever because of the frequent use of
digital information in our life. We see a lot of work in the area of
textual information but in multimedia information, we cannot find
much progress. In text based information, new technology of data
mining and data marts are now in working that were started from the
basic concept of database some where in 1960.
In image search and especially in image identification,
computerized system at very initial stages. Even in the area of image
search we cannot see much progress as in the case of text based
search techniques. One main reason for this is the wide spread roots
of image search where many area like artificial intelligence,
statistics, image processing, pattern recognition play their role. Even
human psychology and perception and cultural diversity also have
their share for the design of a good and efficient image recognition
and retrieval system.
A new object based search technique is presented in this paper
where object in the image are identified on the basis of their
geometrical shapes and other features like color and texture where
object-co-relation augments this search process.
To be more focused on objects identification, simple images are
selected for the work to reduce the role of segmentation in overall
process however same technique can also be applied for other
images.
Abstract: An attractor neural network on the small-world topology
is studied. A learning pattern is presented to the network, then
a stimulus carrying local information is applied to the neurons and
the retrieval of block-like structure is investigated. A synaptic noise
decreases the memory capability. The change of stability from local
to global attractors is shown to depend on the long-range character
of the network connectivity.