Abstract: Depth estimation using stereo images is a challenging problem in computer vision. Many different studies have been carried out to solve this problem. With advancing machine learning, tackling this problem is often done with neural network-based solutions. The images used in these studies are mostly in the visible spectrum. However, the need to use the Infrared (IR) spectrum for depth estimation has emerged because it gives better results than visible spectra in some conditions. At this point, we recommend using thermal-thermal (IR) image pairs for depth estimation. In this study, we used two well-known networks (PSMNet, FADNet) with minor modifications to demonstrate the viability of this idea.
Abstract: Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.
Abstract: Ancient books are significant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufficient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overfitting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the field make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversified samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different fine-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientifically, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.
Abstract: Image segmentation is a method to extract regions of interest from an image. It remains a fundamental problem in computer vision. The increasing diversity and the complexity of segmentation algorithms have led us firstly, to make a review and classify segmentation techniques, secondly to identify the most used measures of segmentation performance and thirdly, discuss deeply on segmentation philosophy in order to help the choice of adequate segmentation techniques for some applications. To justify the relevance of our analysis, recent algorithms of segmentation are presented through the proposed classification.
Abstract: Studies estimate that there will be 266,120 new cases
of invasive breast cancer and 40,920 breast cancer induced deaths
in the year of 2018 alone. Despite the pervasiveness of this
affliction, the current process to obtain an accurate breast cancer
prognosis is tedious and time consuming. It usually requires a
trained pathologist to manually examine histopathological images and
identify the features that characterize various cancer severity levels.
We propose MITOS-RCNN: a region based convolutional neural
network (RCNN) geared for small object detection to accurately
grade one of the three factors that characterize tumor belligerence
described by the Nottingham Grading System: mitotic count. Other
computational approaches to mitotic figure counting and detection
do not demonstrate ample recall or precision to be clinically viable.
Our models outperformed all previous participants in the ICPR 2012
challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14
challenge along with recently published works. Our model achieved
an F- measure score of 0.955, a 6.11% improvement in accuracy from
the most accurate of the previously proposed models.
Abstract: Tracking of moving people has gained a matter of great importance due to rapid technological advancements in the field of computer vision. The objective of this study is to design a motion based detection and tracking multiple walking pedestrians randomly in different directions. In our proposed method, Gaussian mixture model (GMM) is used to determine moving persons in image sequences. It reacts to changes that take place in the scene like different illumination; moving objects start and stop often, etc. Background noise in the scene is eliminated through applying morphological operations and the motions of tracked people which is determined by using the Kalman filter. The Kalman filter is applied to predict the tracked location in each frame and to determine the likelihood of each detection. We used a benchmark data set for the evaluation based on a side wall stationary camera. The actual scenes from the data set are taken on a street including up to eight people in front of the camera in different two scenes, the duration is 53 and 35 seconds, respectively. In the case of walking pedestrians in close proximity, the proposed method has achieved the detection ratio of 87%, and the tracking ratio is 77 % successfully. When they are deferred from each other, the detection ratio is increased to 90% and the tracking ratio is also increased to 79%.
Abstract: Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.
Abstract: Human motion capture has become one of the major
area of interest in the field of computer vision. Some of the major
application areas that have been rapidly evolving include the
advanced human interfaces, virtual reality and security/surveillance
systems. This study provides a brief overview of the techniques and
applications used for the markerless human motion capture, which
deals with analyzing the human motion in the form of mathematical
formulations. The major contribution of this research is that it
classifies the computer vision based techniques of human motion
capture based on the taxonomy, and then breaks its down into four
systematically different categories of tracking, initialization, pose
estimation and recognition. The detailed descriptions and the
relationships descriptions are given for the techniques of tracking and
pose estimation. The subcategories of each process are further
described. Various hypotheses have been used by the researchers in
this domain are surveyed and the evolution of these techniques have
been explained. It has been concluded in the survey that most
researchers have focused on using the mathematical body models for
the markerless motion capture.
Abstract: Motion Tracking and Stereo Vision are complicated,
albeit well-understood problems in computer vision. Existing
softwares that combine the two approaches to perform stereo motion
tracking typically employ complicated and computationally expensive
procedures. The purpose of this study is to create a simple and
effective solution capable of combining the two approaches. The
study aims to explore a strategy to combine the two techniques
of two-dimensional motion tracking using Kalman Filter; and depth
detection of object using Stereo Vision. In conventional approaches
objects in the scene of interest are observed using a single camera.
However for Stereo Motion Tracking; the scene of interest is
observed using video feeds from two calibrated cameras. Using two
simultaneous measurements from the two cameras a calculation for
the depth of the object from the plane containing the cameras is made.
The approach attempts to capture the entire three-dimensional spatial
information of each object at the scene and represent it through a
software estimator object. In discrete intervals, the estimator tracks
object motion in the plane parallel to plane containing cameras and
updates the perpendicular distance value of the object from the plane
containing the cameras as depth. The ability to efficiently track
the motion of objects in three-dimensional space using a simplified
approach could prove to be an indispensable tool in a variety of
surveillance scenarios. The approach may find application from high
security surveillance scenes such as premises of bank vaults, prisons
or other detention facilities; to low cost applications in supermarkets
and car parking lots.
Abstract: Real time image and video processing is a demand in
many computer vision applications, e.g. video surveillance, traffic
management and medical imaging. The processing of those video
applications requires high computational power. Thus, the optimal
solution is the collaboration of CPU and hardware accelerators. In
this paper, a Canny edge detection hardware accelerator is proposed.
Edge detection is one of the basic building blocks of video and image
processing applications. It is a common block in the pre-processing
phase of image and video processing pipeline. Our presented
approach targets offloading the Canny edge detection algorithm from
processing system (PS) to programmable logic (PL) taking the
advantage of High Level Synthesis (HLS) tool flow to accelerate the
implementation on Zynq platform. The resulting implementation
enables up to a 100x performance improvement through hardware
acceleration. The CPU utilization drops down and the frame rate
jumps to 60 fps of 1080p full HD input video stream.
Abstract: Enhancing the quality of two dimensional signals is one of the most important factors in the fields of video surveillance and computer vision. Usually in real-life video surveillance, false detection occurs due to the presence of random noise, illumination
and shadow artifacts. The detection methods based on background subtraction faces several problems in accurately detecting objects in realistic environments: In this paper, we propose a noise removal algorithm using neighborhood comparison method with thresholding. The illumination variations correction is done in the detected foreground objects by using an amalgamation of techniques like homomorphic decomposition, curvelet transformation and gamma adjustment operator. Shadow is removed using chromaticity estimator with local relation estimator. Results are compared with the existing methods and prove as high robustness in the video surveillance.
Abstract: Recent developments in automotive technology are focused on economy, comfort and safety. Vehicle tracking and collision detection systems are attracting attention of many investigators focused on safety of driving in the field of automotive mechatronics. In this paper, a vision-based vehicle detection system is presented. Developed system is intended to be used in collision detection and driver alert. The system uses RGB images captured by a camera in a car driven in the highway. Images captured by the moving camera are used to detect the moving vehicles in the image. A vehicle ahead of the camera is detected in daylight conditions. The proposed method detects moving vehicles by subtracting successive images. Plate height of the vehicle is determined by using a plate recognition algorithm. Distance of the moving object is calculated by using the plate height. After determination of the distance of the moving vehicle relative speed of the vehicle and Time-to-Collision are calculated by using distances measured in successive images. Results obtained in road tests are discussed in order to validate the use of the proposed method.
Abstract: This paper describes a probabilistic method for
three-dimensional object recognition using a shared pool of surface
signatures. This technique uses flatness, orientation, and convexity
signatures that encode the surface of a free-form object into three
discriminative vectors, and then creates a shared pool of data by
clustering the signatures using a distance function. This method
applies the Bayes-s rule for recognition process, and it is extensible
to a large collection of three-dimensional objects.
Abstract: For the communication between human and computer
in an interactive computing environment, the gesture recognition is
studied vigorously. Therefore, a lot of studies have proposed efficient
methods about the recognition algorithm using 2D camera captured
images. However, there is a limitation to these methods, such as the
extracted features cannot fully represent the object in real world.
Although many studies used 3D features instead of 2D features for
more accurate gesture recognition, the problem, such as the processing
time to generate 3D objects, is still unsolved in related researches.
Therefore we propose a method to extract the 3D features combined
with the 3D object reconstruction. This method uses the modified
GPU-based visual hull generation algorithm which disables unnecessary
processes, such as the texture calculation to generate three kinds
of 3D projection maps as the 3D feature: a nearest boundary, a farthest
boundary, and a thickness of the object projected on the base-plane. In
the section of experimental results, we present results of proposed
method on eight human postures: T shape, both hands up, right hand
up, left hand up, hands front, stand, sit and bend, and compare the
computational time of the proposed method with that of the previous
methods.
Abstract: Motion estimation is a key problem in video
processing and computer vision. Optical flow motion estimation can
achieve high estimation accuracy when motion vector is small.
Three-step search algorithm can handle large motion vector but not
very accurate. A joint algorithm was proposed in this paper to
achieve high estimation accuracy disregarding whether the motion
vector is small or large, and keep the computation cost much lower
than full search.