Multi-Sensor Image Fusion for Visible and Infrared Thermal Images

This paper is motivated by the importance of multi-sensor image fusion with specific focus on Infrared (IR) and Visible image (VI) fusion for various applications including military reconnaissance. Image fusion can be defined as the process of combining two or more source images into a single composite image with extended information content that improves visual perception or feature extraction. These images can be from different modalities like Visible camera & IR Thermal Imager. While visible images are captured by reflected radiations in the visible spectrum, the thermal images are formed from thermal radiation (IR) that may be reflected or self-emitted. A digital color camera captures the visible source image and a thermal IR camera acquires the thermal source image. In this paper, some image fusion algorithms based upon Multi-Scale Transform (MST) and region-based selection rule with consistency verification have been proposed and presented. This research includes implementation of the proposed image fusion algorithm in MATLAB along with a comparative analysis to decide the optimum number of levels for MST and the coefficient fusion rule. The results are presented, and several commonly used evaluation metrics are used to assess the suggested method's validity. Experiments show that the proposed approach is capable of producing good fusion results. While deploying our image fusion algorithm approaches, we observe several challenges from the popular image fusion methods. While high computational cost and complex processing steps of image fusion algorithms provide accurate fused results, but they also make it hard to become deployed in system and applications that require real-time operation, high flexibility and low computation ability. So, the methods presented in this paper offer good results with minimum time complexity.

Preserved Relative Differences between Regions of Different Thermal Scans

Rheumatoid Arthritis patients have swelling and pain in joints of the hand. The regions where the patient feels pain also show increased body temperature. Thermal cameras can be used to detect the rise in temperature of the affected regions. To monitor the progression of Rheumatoid Arthritis, patients must visit the clinic regularly for scanning and examination. After scanning and evaluation, the dosage of the medicine is regulated accordingly. To monitor the disease progression over time, the correlation of the images between different visits must be established. It has been observed that the thermal measurements do not remain the same over time, even within a single scanning, when low-cost thermal cameras are used. In some situations, temperatures can vary as much as 2◦C within the same scanning sequence. In this paper, it has been shown that although the absolute temperature varies over time, the relative difference between different regions remains similar. Results have been computed over four scanning sequences and are presented.

Possibilities for Testing User Experience and User Interface Design on Mobile Devices

In an era when everything is increasingly digital, consumers are always looking for new options in solutions to their everyday needs. In this context, mobile apps are developing at an exponential pace. One of the fastest growing segments of mobile technologies is, obviously, e-commerce. It can be predicted that mobile commerce will record nearly three times the global growth of e-commerce across all platforms, which indicates its importance in the given segment. The current coronavirus pandemic is also changing many of the existing paradigms both socially, economically, and technologically, which has a major impact on changing consumer behavior and the emphasis on simplification and clarity of mobile solutions. This is the area that User Experience (UX) and User Interface (UI) designers deal with. Their task is to design a sufficiently attractive and interesting solution that will be available on all mobile devices and at the same time will be easy enough for the customer/visitor to get to the destination or to get the necessary information in a few clicks. The basis for changes in UX design can now be obtained not only through online analytical tools, but also through neuromarketing, especially in the case of mobile devices. The paper highlights possibilities for testing UX design applications on mobile devices using a special platform that combines a stationary eye camera (eye tracking) and facial analysis (facial coding).

Automated Driving Deep Neural Network Model Accuracy and Performance Assessment in a Simulated Environment

The evolution and integration of automated vehicles have become more and more tangible in recent years. State-of-the-art technological advances in the field of camera-based Artificial Intelligence (AI) and computer vision greatly favor the performance and reliability of Advanced Driver Assistance System (ADAS), leading to a greater knowledge of vehicular operation and resembling the human behaviour. However, the exclusive use of this technology still seems insufficient to control the vehicular operation at 100%. To reveal the degree of accuracy of the current camera-based automated driving AI modules, this paper studies the structure and behavior of one of the main solutions in a controlled testing environment. The results obtained clearly outline the lack of reliability when using exclusively the AI model in the perception stage, thereby entailing using additional complementary sensors to improve its safety and performance.

Depth Camera Aided Dead-Reckoning Localization of Autonomous Mobile Robots in Unstructured Global Navigation Satellite System Denied Environments

In global navigation satellite system (GNSS) denied settings, such as indoor environments, autonomous mobile robots are often limited to dead-reckoning navigation techniques to determine their position, velocity, and attitude (PVA). Localization is typically accomplished by employing an inertial measurement unit (IMU), which, while precise in nature, accumulates errors rapidly and severely degrades the localization solution. Standard sensor fusion methods, such as Kalman filtering, aim to fuse precise IMU measurements with accurate aiding sensors to establish a precise and accurate solution. In indoor environments, where GNSS and no other a priori information is known about the environment, effective sensor fusion is difficult to achieve, as accurate aiding sensor choices are sparse. However, an opportunity arises by employing a depth camera in the indoor environment. A depth camera can capture point clouds of the surrounding floors and walls. Extracting attitude from these surfaces can serve as an accurate aiding source, which directly combats errors that arise due to gyroscope imperfections. This configuration for sensor fusion leads to a dramatic reduction of PVA error compared to traditional aiding sensor configurations. This paper provides the theoretical basis for the depth camera aiding sensor method, initial expectations of performance benefit via simulation, and hardware implementation thus verifying its veracity. Hardware implementation is performed on the Quanser Qbot 2™ mobile robot, with a Vector-Nav VN-200™ IMU and Kinect™ camera from Microsoft.

Development of a Basic Robot System for Medical and Nursing Care for Patients with Glaucoma

Medical methods to completely treat glaucoma are yet to be developed. Therefore, ophthalmologists manage patients mainly to delay disease progression. Patients with glaucoma are mainly elderly individuals. In elderly people's houses, having an equipment that can provide medical treatment and care can release their family from their care. For elderly people with the glaucoma to live by themselves as much as possible, we developed a support robot having five functions: elderly people care, ophthalmological examination, trip assistance to the neighborhood, medical treatment, and data referral to a hospital. The medical and nursing care robot should approach the visual field that the patients can see at a speed suitable for their eyesight. This is because the robot will be dangerous if it approaches the patients from the visual field that they cannot see. We experimentally developed a robot that brings a white cane to elderly people with glaucoma. The base part of the robot is a carriage, which is a Megarover 1.1, and it has two infrared sensors. The robot moves along a white line on the floor using the infrared sensors and has a special arm, which does not use electricity. The arm can scoop the block attached to the white cane. Next, we also developed a direction detector comprised of a charge-coupled device camera (SVR41ResucueHD; Sun Mechatronics), goggles (MG-277MLF; Midori Anzen Co. Ltd.), and biconvex lenses with a focal length of 25 mm (Edmund Co.). Some young people were photographed using the direction detector, which was put on their faces. Image processing was performed using Scilab 6.1.0 and Image Processing and Computer Vision Toolbox 4.1.2. To measure the people's line of vision, we calculated the iris's center of gravity using five processes: reduction, trimming, binarization or gray scale, edge extraction, and Hough transform. We compared the binarization and gray scale processes in image processing. The binarization process was better than the gray scale process. For edge extraction, we compared five methods: Sobel, Prewitt, Laplacian of Gaussian, fast Fourier transform, and Canny. The Canny method was the optimal extraction method. We performed the Hough transform to search for the main coordinates from the iris's edge, and we found that the Hough transform could calculate the center point of the iris.

Affective Adaptation Design for Better Gaming Experiences

Affective adaptation is a creative way for game designers to add an extra layer of engagement to their productions. When player’s emotions are an explicit factor in mechanics design, endless possibilities for imaginative gameplay emerge. Whilst gaining popularity, existing affective game research mostly runs controlled experiments in restrictive settings and rely on one or more specialist devices for measuring player’s emotional state. These conditions albeit effective, are not necessarily realistic. Moreover, the simplified narrative and intrusive wearables may not be suitable for players. This exploratory study investigates delivering an immersive affective experience in the wild with minimal requirements, in an attempt for the average developer to reach the average player. A puzzle game is created with rich narrative and creative mechanics. It employs both explicit and implicit adaptation and only requires a web camera. Participants played the game on their own machines in various settings. Whilst it was rated feasible, very engaging and enjoyable, it remains questionable whether a fully immersive experience was delivered due to the limited sample size.

Gait Biometric for Person Re-Identification

Biometric identification is to identify unique features in a person like fingerprints, iris, ear, and voice recognition that need the subject's permission and physical contact. Gait biometric is used to identify the unique gait of the person by extracting moving features. The main advantage of gait biometric to identify the gait of a person at a distance, without any physical contact. In this work, the gait biometric is used for person re-identification. The person walking naturally compared with the same person walking with bag, coat and case recorded using long wave infrared, short wave infrared, medium wave infrared and visible cameras. The videos are recorded in rural and in urban environments. The pre-processing technique includes human identified using You Only Look Once, background subtraction, silhouettes extraction and synthesis Gait Entropy Image by averaging the silhouettes. The moving features are extracted from the Gait Entropy Energy Image. The extracted features are dimensionality reduced by the Principal Component Analysis and recognized using different classifiers. The comparative results with the different classifier show that Linear Discriminant Analysis outperform other classifiers with 95.8% for visible in the rural dataset and 94.8% for longwave infrared in the urban dataset.

6D Posture Estimation of Road Vehicles from Color Images

Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.

A Real-Time Bayesian Decision-Support System for Predicting Suspect Vehicle’s Intended Target Using a Sparse Camera Network

We present a decision-support tool to assist an operator in the detection and tracking of a suspect vehicle traveling to an unknown target destination. Multiple data sources, such as traffic cameras, traffic information, weather, etc., are integrated and processed in real-time to infer a suspect’s intended destination chosen from a list of pre-determined high-value targets. Previously, we presented our work in the detection and tracking of vehicles using traffic and airborne cameras. Here, we focus on the fusion and processing of that information to predict a suspect’s behavior. The network of cameras is represented by a directional graph, where the edges correspond to direct road connections between the nodes and the edge weights are proportional to the average time it takes to travel from one node to another. For our experiments, we construct our graph based on the greater Los Angeles subset of the Caltrans’s “Performance Measurement System” (PeMS) dataset. We propose a Bayesian approach where a posterior probability for each target is continuously updated based on detections of the suspect in the live video feeds. Additionally, we introduce the concept of ‘soft interventions’, inspired by the field of Causal Inference. Soft interventions are herein defined as interventions that do not immediately interfere with the suspect’s movements; rather, a soft intervention may induce the suspect into making a new decision, ultimately making their intent more transparent. For example, a soft intervention could be temporarily closing a road a few blocks from the suspect’s current location, which may require the suspect to change their current course. The objective of these interventions is to gain the maximum amount of information about the suspect’s intent in the shortest possible time. Our system currently operates in a human-on-the-loop mode where at each step, a set of recommendations are presented to the operator to aid in decision-making. In principle, the system could operate autonomously, only prompting the operator for critical decisions, allowing the system to significantly scale up to larger areas and multiple suspects. Once the intended target is identified with sufficient confidence, the vehicle is reported to the authorities to take further action. Other recommendations include a selection of road closures, i.e., soft interventions, or to continue monitoring. We evaluate the performance of the proposed system using simulated scenarios where the suspect, starting at random locations, takes a noisy shortest path to their intended target. In all scenarios, the suspect’s intended target is unknown to our system. The decision thresholds are selected to maximize the chances of determining the suspect’s intended target in the minimum amount of time and with the smallest number of interventions. We conclude by discussing the limitations of our current approach to motivate a machine learning approach, based on reinforcement learning in order to relax some of the current limiting assumptions.

Adjustable Counter-Weight for Full Turn Rotary Systems

It is necessary to test to see if optical devices such as camera, night vision devices are working properly. Therefore, a precision biaxial rotary system (gimbal) is required for mounting Unit Under Test, UUT. The Gimbal systems can be utilized for precise positioning of the UUT; hence, optical test can be performed with high accuracy. The weight of UUT, which is placed outside the axis of rotation, causes an off-axis moment to the mounting armature. The off-axis moment can act against the direction of movement for some orientation, thus the electrical motor, which rotates the gimbal axis, has to apply higher level of torque to guide and stabilize the system. Moreover, UUT and its mounting fixture to the gimbal can be changed, which causes change in applied resistance moment to the gimbals electrical motor. In this study, a preloaded spring is added to the gimbal system for minimizing applied off axis moment with the help of four bar mechanism. Two different possible methods for preloading spring are introduced and system optimization is performed to eliminate all moment which is created by off axis weight.

Geometric Contrast of a 3D Model Obtained by Means of Digital Photogrametry with a Quasimetric Camera on UAV Classical Methods

Nowadays, the use of drones has been extended to practically any human activity. One of the main applications is focused on the surveying field. In this regard, software programs that process the images captured by the sensor from the drone in an almost automatic way have been developed and commercialized, but they only allow contrasting the results through control points. This work proposes the contrast of a 3D model obtained from a flight developed by a drone and a non-metric camera (due to its low cost), with a second model that is obtained by means of the historically-endorsed classical methods. In addition to this, the contrast is developed over a certain territory with a significant unevenness, so as to test the model generated with photogrammetry, and considering that photogrammetry with drones finds more difficulties in terms of accuracy in this kind of situations. Distances, heights, surfaces and volumes are measured on the basis of the 3D models generated, and the results are contrasted. The differences are about 0.2% for the measurement of distances and heights, 0.3% for surfaces and 0.6% when measuring volumes. Although they are not important, they do not meet the order of magnitude that is presented by salespeople.

Modal Analysis of a Cantilever Beam Using an Inexpensive Smartphone Camera: Motion Magnification Technique

This paper aims to prove the accuracy of an inexpensive smartphone camera as a non-contact vibration sensor to recover the vibration modes of a vibrating structure such as a cantilever beam. A video of a vibrating beam is filmed using a smartphone camera and then processed by the motion magnification technique. Based on this method, the first two natural frequencies and their associated mode shapes are estimated experimentally and compared to the analytical ones. Results show a relative error of less than 4% between the experimental and analytical approaches for the first two natural frequencies of the beam. Also, for the first two-mode shapes, a Modal Assurance Criterion (MAC) value of above 0.9 between the two approaches is obtained. This slight error between the different techniques ensures the viability of a cheap smartphone camera as a non-contact vibration sensor, particularly for structures vibrating at relatively low natural frequencies.

Improved Rare Species Identification Using Focal Loss Based Deep Learning Models

The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.

Intraoperative ICG-NIR Fluorescence Angiography Visualization of Intestinal Perfusion in Primary Pull-Through for Hirschsprung Disease

Purpose: Assessment of anastomotic perfusion in Hirschsprung disease using Indocyanine Green (ICG)-near-infrared (NIR) fluorescence angiography. Introduction: Anastomotic stricture and leak are well-known complications of Hirschsprung pull-through procedures. Complications are due to tension, infection, and/or poor perfusion. While a surgeon can visually determine and control the amount of tension and contamination, assessment of perfusion is subject to surgeon determination. Intraoperative use of ICG-NIR enhances this decision-making process by illustrating perfusion intensity and adequacy in the pulled-through bowel segment. This technique, proven to reduce anastomotic stricture and leak in adults, has not been studied in children to our knowledge. ICG, an FDA approved, nontoxic, non-immunogenic, intravascular (IV) dye, has been used in adults and children for over 60 years, with few side effects. ICG-NIR was used in this report to demonstrate the adequacy of perfusion during transanal pullthrough for Hirschsprung’s disease. Method: 8 patients with Hirschsprung disease were evaluated with ICG-NIR technology. Levels of affected area ranged from sigmoid to total colonic Hirschsprung disease. After leveling, but prior to anastomosis, ICG was administered at 1.25 mg (< 2 mg/kg) and perfusion visualized using an NIR camera, before and during anastomosis. Video and photo imaging was performed and perfusion of the bowel was compared to surrounding tissues. This showed the degree of perfusion and demarcation of perfused and non-perfused bowel. The anastomosis was completed uneventfully and the patients all did well. Results: There were no complications of stricture or leak. 5 of 8 patients (62.5%) had modification of the plan based on ICG-NIR imaging. Conclusion: Technologies that enhance surgeons’ ability to visualize bowel perfusion prior to anastomosis in Hirschsprung’s patients may help reduce post-operative complications. Further studies are needed to assess the potential benefits.

Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Low-Cost Mechatronic Design of an Omnidirectional Mobile Robot

This paper presents the results of a mechatronic design based on a 4-wheel omnidirectional mobile robot that can be used in indoor logistic applications. The low-level control has been selected using two open-source hardware (Raspberry Pi 3 Model B+ and Arduino Mega 2560) that control four industrial motors, four ultrasound sensors, four optical encoders, a vision system of two cameras, and a Hokuyo URG-04LX-UG01 laser scanner. Moreover, the system is powered with a lithium battery that can supply 24 V DC and a maximum current-hour of 20Ah.The Robot Operating System (ROS) has been implemented in the Raspberry Pi and the performance is evaluated with the selection of the sensors and hardware selected. The mechatronic system is evaluated and proposed safe modes of power distribution for controlling all the electronic devices based on different tests. Therefore, based on different performance results, some recommendations are indicated for using the Raspberry Pi and Arduino in terms of power, communication, and distribution of control for different devices. According to these recommendations, the selection of sensors is distributed in both real-time controllers (Arduino and Raspberry Pi). On the other hand, the drivers of the cameras have been implemented in Linux and a python program has been implemented to access the cameras. These cameras will be used for implementing a deep learning algorithm to recognize people and objects. In this way, the level of intelligence can be increased in combination with the maps that can be obtained from the laser scanner.

Deep Learning Application for Object Image Recognition and Robot Automatic Grasping

Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.

Authentication of Physical Objects with Dot-Based 2D Code

Counterfeit goods and documents are a global problem, which needs more and more sophisticated methods of resolving it. Existing techniques using watermarking or embedding symbols on objects are not suitable for all use cases. To address those special needs, we created complete system allowing authentication of paper documents and physical objects with flat surface. Objects are marked using orientation independent and resistant to camera noise 2D graphic codes, named DotAuth. Based on the identifier stored in 2D code, the system is able to perform basic authentication and allows to conduct more sophisticated analysis methods, e.g., relying on augmented reality and physical properties of the object. In this paper, we present the complete architecture, algorithms and applications of the proposed system. Results of the features comparison of the proposed solution and other products are presented as well, pointing to the existence of many advantages that increase usability and efficiency in the means of protecting physical objects.

Infrared Lightbox and iPhone App for Improving Detection Limit of Phosphate Detecting Dip Strips

In this paper, we report the development of a portable and inexpensive infrared lightbox for improving the detection limits of paper-based phosphate devices. Commercial paper-based devices utilize the molybdenum blue protocol to detect phosphate in the environment. Although these devices are easy to use and have a long shelf life, their main deficiency is their low sensitivity based on the qualitative results obtained via a color chart. To improve the results, we constructed a compact infrared lightbox that communicates wirelessly with a smartphone. The system measures the absorbance of radiation for the molybdenum blue reaction in the infrared region of the spectrum. It consists of a lightbox illuminated by four infrared light-emitting diodes, an infrared digital camera, a Raspberry Pi microcontroller, a mini-router, and an iPhone to control the microcontroller. An iPhone application was also developed to analyze images captured by the infrared camera in order to quantify phosphate concentrations. Additionally, the app connects to an online data center to present a highly scalable worldwide system for tracking and analyzing field measurements. In this study, the detection limits for two popular commercial devices were improved by a factor of 4 for the Quantofix devices (from 1.3 ppm using visible light to 300 ppb using infrared illumination) and a factor of 6 for the Indigo units (from 9.2 ppm to 1.4 ppm) with repeatability of less than or equal to 1.2% relative standard deviation (RSD). The system also provides more granular concentration information compared to the discrete color chart used by commercial devices and it can be easily adapted for use in other applications.