The Role of Synthetic Data in Aerial Object Detection

The purpose of this study is to explore the characteristics of developing a machine learning application using synthetic data. The study is structured to develop the application for the purpose of deploying the computer vision model. The findings discuss the realities of attempting to develop a computer vision model for practical purpose, and detail the processes, tools and techniques that were used to meet accuracy requirements. The research reveals that synthetic data represent another variable that can be adjusted to improve the performance of a computer vision model. Further, a suite of tools and tuning recommendations are provided.

A Comparison of YOLO Family for Apple Detection and Counting in Orchards

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.

Facial Emotion Recognition with Convolutional Neural Network Based Architecture

Neural networks are appealing for many applications since they are able to learn complex non-linear relationships between input and output data. As the number of neurons and layers in a neural network increase, it is possible to represent more complex relationships with automatically extracted features. Nowadays Deep Neural Networks (DNNs) are widely used in Computer Vision problems such as; classification, object detection, segmentation image editing etc. In this work, Facial Emotion Recognition task is performed by proposed Convolutional Neural Network (CNN)-based DNN architecture using FER2013 Dataset. Moreover, the effects of different hyperparameters (activation function, kernel size, initializer, batch size and network size) are investigated and ablation study results for Pooling Layer, Dropout and Batch Normalization are presented.

The Layout Analysis of Handwriting Characters and the Fusion of Multi-style Ancient Books’ Background

Ancient books are significant culture inheritors and their background textures convey the potential history information. However, multi-style texture recovery of ancient books has received little attention. Restricted by insufficient ancient textures and complex handling process, the generation of ancient textures confronts with new challenges. For instance, training without sufficient data usually brings about overfitting or mode collapse, so some of the outputs are prone to be fake. Recently, image generation and style transfer based on deep learning are widely applied in computer vision. Breakthroughs within the field make it possible to conduct research upon multi-style texture recovery of ancient books. Under the circumstances, we proposed a network of layout analysis and image fusion system. Firstly, we trained models by using Deep Convolution Generative against Networks (DCGAN) to synthesize multi-style ancient textures; then, we analyzed layouts based on the Position Rearrangement (PR) algorithm that we proposed to adjust the layout structure of foreground content; at last, we realized our goal by fusing rearranged foreground texts and generated background. In experiments, diversified samples such as ancient Yi, Jurchen, Seal were selected as our training sets. Then, the performances of different fine-turning models were gradually improved by adjusting DCGAN model in parameters as well as structures. In order to evaluate the results scientifically, cross entropy loss function and Fréchet Inception Distance (FID) are selected to be our assessment criteria. Eventually, we got model M8 with lowest FID score. Compared with DCGAN model proposed by Radford at el., the FID score of M8 improved by 19.26%, enhancing the quality of the synthetic images profoundly.

Make Up Flash: Web Application for the Improvement of Physical Appearance in Images Based on Recognition Methods

This paper presents a web application for the improvement of images through recognition. The web application is based on the analysis of picture-based recognition methods that allow an improvement on the physical appearance of people posting in social networks. The basis relies on the study of tools that can correct or improve some features of the face, with the help of a wide collection of user images taken as reference to build a facial profile. Automatic facial profiling can be achieved with a deeper study of the Object Detection Library. It was possible to improve the initial images with the help of MATLAB and its filtering functions. The user can have a direct interaction with the program and manually adjust his preferences.

Vision-Based Collision Avoidance for Unmanned Aerial Vehicles by Recurrent Neural Networks

Due to the sensor technology, video surveillance has become the main way for security control in every big city in the world. Surveillance is usually used by governments for intelligence gathering, the prevention of crime, the protection of a process, person, group or object, or the investigation of crime. Many surveillance systems based on computer vision technology have been developed in recent years. Moving target tracking is the most common task for Unmanned Aerial Vehicle (UAV) to find and track objects of interest in mobile aerial surveillance for civilian applications. The paper is focused on vision-based collision avoidance for UAVs by recurrent neural networks. First, images from cameras on UAV were fused based on deep convolutional neural network. Then, a recurrent neural network was constructed to obtain high-level image features for object tracking and extracting low-level image features for noise reducing. The system distributed the calculation of the whole system to local and cloud platform to efficiently perform object detection, tracking and collision avoidance based on multiple UAVs. The experiments on several challenging datasets showed that the proposed algorithm outperforms the state-of-the-art methods.

MITOS-RCNN: Mitotic Figure Detection in Breast Cancer Histopathology Images Using Region Based Convolutional Neural Networks

Studies estimate that there will be 266,120 new cases of invasive breast cancer and 40,920 breast cancer induced deaths in the year of 2018 alone. Despite the pervasiveness of this affliction, the current process to obtain an accurate breast cancer prognosis is tedious and time consuming. It usually requires a trained pathologist to manually examine histopathological images and identify the features that characterize various cancer severity levels. We propose MITOS-RCNN: a region based convolutional neural network (RCNN) geared for small object detection to accurately grade one of the three factors that characterize tumor belligerence described by the Nottingham Grading System: mitotic count. Other computational approaches to mitotic figure counting and detection do not demonstrate ample recall or precision to be clinically viable. Our models outperformed all previous participants in the ICPR 2012 challenge, the AMIDA 2013 challenge and the MITOS-ATYPIA-14 challenge along with recently published works. Our model achieved an F- measure score of 0.955, a 6.11% improvement in accuracy from the most accurate of the previously proposed models.

Object Detection in Digital Images under Non-Standardized Conditions Using Illumination and Shadow Filtering

In recent years, object detection has gained much attention and very encouraging research area in the field of computer vision. The robust object boundaries detection in an image is demanded in numerous applications of human computer interaction and automated surveillance systems. Many methods and approaches have been developed for automatic object detection in various fields, such as automotive, quality control management and environmental services. Inappropriately, to the best of our knowledge, object detection under illumination with shadow consideration has not been well solved yet. Furthermore, this problem is also one of the major hurdles to keeping an object detection method from the practical applications. This paper presents an approach to automatic object detection in images under non-standardized environmental conditions. A key challenge is how to detect the object, particularly under uneven illumination conditions. Image capturing conditions the algorithms need to consider a variety of possible environmental factors as the colour information, lightening and shadows varies from image to image. Existing methods mostly failed to produce the appropriate result due to variation in colour information, lightening effects, threshold specifications, histogram dependencies and colour ranges. To overcome these limitations we propose an object detection algorithm, with pre-processing methods, to reduce the interference caused by shadow and illumination effects without fixed parameters. We use the Y CrCb colour model without any specific colour ranges and predefined threshold values. The segmented object regions are further classified using morphological operations (Erosion and Dilation) and contours. Proposed approach applied on a large image data set acquired under various environmental conditions for wood stack detection. Experiments show the promising result of the proposed approach in comparison with existing methods.

Automated Video Surveillance System for Detection of Suspicious Activities during Academic Offline Examination

This research work aims to develop a system that will analyze and identify students who indulge in malpractices/suspicious activities during the course of an academic offline examination. Automated Video Surveillance provides an optimal solution which helps in monitoring the students and identifying the malpractice event immediately. This work is organized into three modules. The first module deals with performing an impersonation check using a PCA-based face recognition method which is done by cross checking his profile with the database. The presence or absence of the student is even determined in this module by implementing an image registration technique wherein a grid is formed by considering all the images registered using the frontal camera at the determined positions. Second, detecting such facial malpractices in which a student gets involved in conversation with another, trying to obtain unauthorized information etc., based on the threshold range evaluated by considering his/her mouth state whether open or closed. The third module deals with identification of unauthorized material or gadgets used in the examination hall by training the positive samples of the object through various stages. Here, a top view camera feed is analyzed to detect the suspicious activities. The system automatically alerts the administration when any suspicious activities are identified, thereby reducing the error rate caused due to manual monitoring. This work is an improvement over our previous work published in identifying suspicious activities done by examinees in an offline examination.

An Efficient Fundamental Matrix Estimation for Moving Object Detection

In this paper, an improved method for estimating fundamental matrix is proposed. The method is applied effectively to monocular camera based moving object detection. The method consists of corner points detection, moving object’s motion estimation and fundamental matrix calculation. The corner points are obtained by using Harris corner detector, motions of moving objects is calculated from pyramidal Lucas-Kanade optical flow algorithm. Through epipolar geometry analysis using RANSAC, the fundamental matrix is calculated. In this method, we have improved the performances of moving object detection by using two threshold values that determine inlier or outlier. Through the simulations, we compare the performances with varying the two threshold values.

Accurate Position Electromagnetic Sensor Using Data Acquisition System

This paper presents a high position electromagnetic sensor system (HPESS) that is applicable for moving object detection. The authors have developed a high-performance position sensor prototype dedicated to students’ laboratory. The challenge was to obtain a highly accurate and real-time sensor that is able to calculate position, length or displacement. An electromagnetic solution based on a two coil induction principal was adopted. The HPESS converts mechanical motion to electric energy with direct contact. The output signal can then be fed to an electronic circuit. The voltage output change from the sensor is captured by data acquisition system using LabVIEW software. The displacement of the moving object is determined. The measured data are transmitted to a PC in real-time via a DAQ (NI USB -6281). This paper also describes the data acquisition analysis and the conditioning card developed specially for sensor signal monitoring. The data is then recorded and viewed using a user interface written using National Instrument LabVIEW software. On-line displays of time and voltage of the sensor signal provide a user-friendly data acquisition interface. The sensor provides an uncomplicated, accurate, reliable, inexpensive transducer for highly sophisticated control systems.

Moving Object Detection Using Histogram of Uniformly Oriented Gradient

Moving object detection (MOD) is an important issue in advanced driver assistance systems (ADAS). There are two important moving objects, pedestrians and scooters in ADAS. In real-world systems, there exist two important challenges for MOD, including the computational complexity and the detection accuracy. The histogram of oriented gradient (HOG) features can easily detect the edge of object without invariance to changes in illumination and shadowing. However, to reduce the execution time for real-time systems, the image size should be down sampled which would lead the outlier influence to increase. For this reason, we propose the histogram of uniformly-oriented gradient (HUG) features to get better accurate description of the contour of human body. In the testing phase, the support vector machine (SVM) with linear kernel function is involved. Experimental results show the correctness and effectiveness of the proposed method. With SVM classifiers, the real testing results show the proposed HUG features achieve better than classification performance than the HOG ones.

Challenges in Video Based Object Detection in Maritime Scenario Using Computer Vision

This paper discusses the technical challenges in maritime image processing and machine vision problems for video streams generated by cameras. Even well documented problems of horizon detection and registration of frames in a video are very challenging in maritime scenarios. More advanced problems of background subtraction and object detection in video streams are very challenging. Challenges arising from the dynamic nature of the background, unavailability of static cues, presence of small objects at distant backgrounds, illumination effects, all contribute to the challenges as discussed here.

Reduction of False Positives in Head-Shoulder Detection Based on Multi-Part Color Segmentation

The paper presents a method that utilizes figure-ground color segmentation to extract effective global feature in terms of false positive reduction in the head-shoulder detection. Conventional detectors that rely on local features such as HOG due to real-time operation suffer from false positives. Color cue in an input image provides salient information on a global characteristic which is necessary to alleviate the false positives of the local feature based detectors. An effective approach that uses figure-ground color segmentation has been presented in an effort to reduce the false positives in object detection. In this paper, an extended version of the approach is presented that adopts separate multipart foregrounds instead of a single prior foreground and performs the figure-ground color segmentation with each of the foregrounds. The multipart foregrounds include the parts of the head-shoulder shape and additional auxiliary foregrounds being optimized by a search algorithm. A classifier is constructed with the feature that consists of a set of the multiple resulting segmentations. Experimental results show that the presented method can discriminate more false positive than the single prior shape-based classifier as well as detectors with the local features. The improvement is possible because the presented approach can reduce the false positives that have the same colors in the head and shoulder foregrounds.

Design and Implementation of a Counting and Differentiation System for Vehicles through Video Processing

This paper presents a self-sustaining mobile system for counting and classification of vehicles through processing video. It proposes a counting and classification algorithm divided in four steps that can be executed multiple times in parallel in a SBC (Single Board Computer), like the Raspberry Pi 2, in such a way that it can be implemented in real time. The first step of the proposed algorithm limits the zone of the image that it will be processed. The second step performs the detection of the mobile objects using a BGS (Background Subtraction) algorithm based on the GMM (Gaussian Mixture Model), as well as a shadow removal algorithm using physical-based features, followed by morphological operations. In the first step the vehicle detection will be performed by using edge detection algorithms and the vehicle following through Kalman filters. The last step of the proposed algorithm registers the vehicle passing and performs their classification according to their areas. An auto-sustainable system is proposed, powered by batteries and photovoltaic solar panels, and the data transmission is done through GPRS (General Packet Radio Service)eliminating the need of using external cable, which will facilitate it deployment and translation to any location where it could operate. The self-sustaining trailer will allow the counting and classification of vehicles in specific zones with difficult access.

Object Detection Based on Plane Segmentation and Features Matching for a Service Robot

With the aging of the world population and the continuous growth in technology, service robots are more and more explored nowadays as alternatives to healthcare givers or personal assistants for the elderly or disabled people. Any service robot should be capable of interacting with the human companion, receive commands, navigate through the environment, either known or unknown, and recognize objects. This paper proposes an approach for object recognition based on the use of depth information and color images for a service robot. We present a study on two of the most used methods for object detection, where 3D data is used to detect the position of objects to classify that are found on horizontal surfaces. Since most of the objects of interest accessible for service robots are on these surfaces, the proposed 3D segmentation reduces the processing time and simplifies the scene for object recognition. The first approach for object recognition is based on color histograms, while the second is based on the use of the SIFT and SURF feature descriptors. We present comparative experimental results obtained with a real service robot.

Dynamic Background Updating for Lightweight Moving Object Detection

Background subtraction and temporal difference are often used for moving object detection in video. Both approaches are computationally simple and easy to be deployed in real-time image processing. However, while the background subtraction is highly sensitive to dynamic background and illumination changes, the temporal difference approach is poor at extracting relevant pixels of the moving object and at detecting the stopped or slowly moving objects in the scene. In this paper, we propose a simple moving object detection scheme based on adaptive background subtraction and temporal difference exploiting dynamic background updates. The proposed technique consists of histogram equalization, a linear combination of background and temporal difference, followed by the novel frame-based and pixel-based background updating techniques. Finally, morphological operations are applied to the output images. Experimental results show that the proposed algorithm can solve the drawbacks of both background subtraction and temporal difference methods and can provide better performance than that of each method.

A Background Subtraction Based Moving Object Detection around the Host Vehicle

In this paper, we propose moving object detection method which is helpful for driver to safely take his/her car out of parking lot. When moving objects such as motorbikes, pedestrians, the other cars and some obstacles are detected at the rear-side of host vehicle, the proposed algorithm can provide to driver warning. We assume that the host vehicle is just before departure. Gaussian Mixture Model (GMM) based background subtraction is basically applied. Pre-processing such as smoothing and post-processing as morphological filtering are added. We examine “which color space has better performance for detection of moving objects?” Three color spaces including RGB, YCbCr, and Y are applied and compared, in terms of detection rate. Through simulation, we prove that RGB space is more suitable for moving object detection based on background subtraction.

Automatic Motion Trajectory Analysis for Dual Human Interaction Using Video Sequences

Advance in techniques of image and video processing has enabled the development of intelligent video surveillance systems. This study was aimed to automatically detect moving human objects and to analyze events of dual human interaction in a surveillance scene. Our system was developed in four major steps: image preprocessing, human object detection, human object tracking, and motion trajectory analysis. The adaptive background subtraction and image processing techniques were used to detect and track moving human objects. To solve the occlusion problem during the interaction, the Kalman filter was used to retain a complete trajectory for each human object. Finally, the motion trajectory analysis was developed to distinguish between the interaction and non-interaction events based on derivatives of trajectories related to the speed of the moving objects. Using a database of 60 video sequences, our system could achieve the classification accuracy of 80% in interaction events and 95% in non-interaction events, respectively. In summary, we have explored the idea to investigate a system for the automatic classification of events for interaction and non-interaction events using surveillance cameras. Ultimately, this system could be incorporated in an intelligent surveillance system for the detection and/or classification of abnormal or criminal events (e.g., theft, snatch, fighting, etc.). 

Toward Indoor and Outdoor Surveillance Using an Improved Fast Background Subtraction Algorithm

The detection of moving objects from a video image sequences is very important for object tracking, activity recognition, and behavior understanding in video surveillance. The most used approach for moving objects detection / tracking is background subtraction algorithms. Many approaches have been suggested for background subtraction. But, these are illumination change sensitive and the solutions proposed to bypass this problem are time consuming. In this paper, we propose a robust yet computationally efficient background subtraction approach and, mainly, focus on the ability to detect moving objects on dynamic scenes, for possible applications in complex and restricted access areas monitoring, where moving and motionless persons must be reliably detected. It consists of three main phases, establishing illumination changes invariance, background/foreground modeling and morphological analysis for noise removing. We handle illumination changes using Contrast Limited Histogram Equalization (CLAHE), which limits the intensity of each pixel to user determined maximum. Thus, it mitigates the degradation due to scene illumination changes and improves the visibility of the video signal. Initially, the background and foreground images are extracted from the video sequence. Then, the background and foreground images are separately enhanced by applying CLAHE. In order to form multi-modal backgrounds we model each channel of a pixel as a mixture of K Gaussians (K=5) using Gaussian Mixture Model (GMM). Finally, we post process the resulting binary foreground mask using morphological erosion and dilation transformations to remove possible noise. For experimental test, we used a standard dataset to challenge the efficiency and accuracy of the proposed method on a diverse set of dynamic scenes.