High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform

Real time image and video processing is a demand in many computer vision applications, e.g. video surveillance, traffic management and medical imaging. The processing of those video applications requires high computational power. Thus, the optimal solution is the collaboration of CPU and hardware accelerators. In this paper, a Canny edge detection hardware accelerator is proposed. Edge detection is one of the basic building blocks of video and image processing applications. It is a common block in the pre-processing phase of image and video processing pipeline. Our presented approach targets offloading the Canny edge detection algorithm from processing system (PS) to programmable logic (PL) taking the advantage of High Level Synthesis (HLS) tool flow to accelerate the implementation on Zynq platform. The resulting implementation enables up to a 100x performance improvement through hardware acceleration. The CPU utilization drops down and the frame rate jumps to 60 fps of 1080p full HD input video stream.

Unsupervised Feature Learning by Pre-Route Simulation of Auto-Encoder Behavior Model

This paper describes a cycle accurate simulation results of weight values learned by an auto-encoder behavior model in terms of pre-route simulation. Given the results we visualized the first layer representations with natural images. Many common deep learning threads have focused on learning high-level abstraction of unlabeled raw data by unsupervised feature learning. However, in the process of handling such a huge amount of data, the learning method’s computation complexity and time limited advanced research. These limitations came from the fact these algorithms were computed by using only single core CPUs. For this reason, parallel-based hardware, FPGAs, was seen as a possible solution to overcome these limitations. We adopted and simulated the ready-made auto-encoder to design a behavior model in VerilogHDL before designing hardware. With the auto-encoder behavior model pre-route simulation, we obtained the cycle accurate results of the parameter of each hidden layer by using MODELSIM. The cycle accurate results are very important factor in designing a parallel-based digital hardware. Finally this paper shows an appropriate operation of behavior model based pre-route simulation. Moreover, we visualized learning latent representations of the first hidden layer with Kyoto natural image dataset.

On the Design of Electronic Control Unitsfor the Safety-Critical Vehicle Applications

This paper suggests a design methodology for the hardware and software of the electronic control unit (ECU) of safety-critical vehicle applications such as braking and steering. The architecture of the hardware is a high integrity system such thatit incorporates a high performance 32-bit CPU and a separate peripheral controlprocessor (PCP) together with an external watchdog CPU. Communication between the main CPU and the PCP is executed via a common area of RAM and events on either processor which are invoked by interrupts. Safety-related software is also implemented to provide a reliable, self-testing computing environment for safety critical and high integrity applications. The validity of the design approach is shown by using the hardware-in-the-loop simulation (HILS)for electric power steering(EPS) systemswhich consists of the EPS mechanism, the designed ECU, and monitoring tools.

GPU Based High Speed Error Protection for Watermarked Medical Image Transmission

Medical image is an integral part of e-health care and e-diagnosis system. Medical image watermarking is widely used to protect patients’ information from malicious alteration and manipulation. The watermarked medical images are transmitted over the internet among patients, primary and referred physicians. The images are highly prone to corruption in the wireless transmission medium due to various noises, deflection, and refractions. Distortion in the received images leads to faulty watermark detection and inappropriate disease diagnosis. To address the issue, this paper utilizes error correction code (ECC) with (8, 4) Hamming code in an existing watermarking system. In addition, we implement the high complex ECC on a graphics processing units (GPU) to accelerate and support real-time requirement. Experimental results show that GPU achieves considerable speedup over the sequential CPU implementation, while maintaining 100% ECC efficiency.

Mobile Cloud Middleware: A New Service for Mobile Users

Cloud computing (CC) and mobile cloud computing (MCC) have advanced rapidly the last few years. Today, MCC undergoes fast improvement and progress in terms of hardware (memory, embedded sensors, power consumption, touch screen, etc.) software (more and more sophisticated mobile applications) and transmission (higher data transmission rates achieved with different technologies such as 3Gs). This paper presents a review on the concept of CC and MCC. Then, it discusses what has been done regarding middleware in cloud and mobile cloud computing. Later, it shows the architecture of our proposed middleware along with its functionalities which will be provided to mobile clients in order to overcome the well known problems (such as low battery power, slow CPU speed and little memory…).

A New Modification of Nonlinear Conjugate Gradient Coefficients with Global Convergence Properties

Conjugate gradient method has been enormously used to solve large scale unconstrained optimization problems due to the number of iteration, memory, CPU time, and convergence property, in this paper we find a new class of nonlinear conjugate gradient coefficient with global convergence properties proved by exact line search. The numerical results for our new βK give a good result when it compared with well known formulas.

Model Order Reduction of Linear Time Variant High Speed VLSI Interconnects using Frequency Shift Technique

Accurate modeling of high speed RLC interconnects has become a necessity to address signal integrity issues in current VLSI design. To accurately model a dispersive system of interconnects at higher frequencies; a full-wave analysis is required. However, conventional circuit simulation of interconnects with full wave models is extremely CPU expensive. We present an algorithm for reducing large VLSI circuits to much smaller ones with similar input-output behavior. A key feature of our method, called Frequency Shift Technique, is that it is capable of reducing linear time-varying systems. This enables it to capture frequency-translation and sampling behavior, important in communication subsystems such as mixers, RF components and switched-capacitor filters. Reduction is obtained by projecting the original system described by linear differential equations into a lower dimension. Experiments have been carried out using Cadence Design Simulator cwhich indicates that the proposed technique achieves more % reduction with less CPU time than the other model order reduction techniques existing in literature. We also present applications to RF circuit subsystems, obtaining size reductions and evaluation speedups of orders of magnitude with insignificant loss of accuracy.

High Level Characterization and Optimization of Switched-Current Sigma-Delta Modulators with VHDL-AMS

Today, design requirements are extending more and more from electronic (analogue and digital) to multidiscipline design. These current needs imply implementation of methodologies to make the CAD product reliable in order to improve time to market, study costs, reusability and reliability of the design process. This paper proposes a high level design approach applied for the characterization and the optimization of Switched-Current Sigma- Delta Modulators. It uses the new hardware description language VHDL-AMS to help the designers to optimize the characteristics of the modulator at a high level with a considerably reduced CPU time before passing to a transistor level characterization.

A New Fast Skin Color Detection Technique

Skin color can provide a useful and robust cue for human-related image analysis, such as face detection, pornographic image filtering, hand detection and tracking, people retrieval in databases and Internet, etc. The major problem of such kinds of skin color detection algorithms is that it is time consuming and hence cannot be applied to a real time system. To overcome this problem, we introduce a new fast technique for skin detection which can be applied in a real time system. In this technique, instead of testing each image pixel to label it as skin or non-skin (as in classic techniques), we skip a set of pixels. The reason of the skipping process is the high probability that neighbors of the skin color pixels are also skin pixels, especially in adult images and vise versa. The proposed method can rapidly detect skin and non-skin color pixels, which in turn dramatically reduce the CPU time required for the protection process. Since many fast detection techniques are based on image resizing, we apply our proposed pixel skipping technique with image resizing to obtain better results. The performance evaluation of the proposed skipping and hybrid techniques in terms of the measured CPU time is presented. Experimental results demonstrate that the proposed methods achieve better result than the relevant classic method.

Approaches and Schemes for Storing DTDIndependent XML Data in Relational Databases

The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches. In this study, we use a common benchmarking scheme, known as XMark to compare the most cited and newly proposed DTD-independent methods in terms of logical reads, physical I/O, CPU time and duration. We show the effect of Label Path, extracting values and storing in another table and type of join needed for each method-s query answering.

Minimizing the Broadcast Traffic in the Jordanian Discovery Schools Network using PPPoE

Discovery schools in Jordan are connected in one flat ATM bridge network. All Schools connected to the network will hear broadcast traffic. High percentage of unwanted traffic such as broadcast, consumes the bandwidth between schools and QRC. Routers in QRC have high CPU utilization. The number of connections on the router is very high, and may exceed recommend manufacturing specifications. One way to minimize number of connections to the routers in QRC, and minimize broadcast traffic is to use PPPoE. In this study, a PPPoE solution has been presented which shows high performance for the clients when accessing the school server resources. Despite the large number of the discovery schools at MoE, the experimental results show that the PPPoE solution is able to yield a satisfactory performance for each client at the school and noticeably reduce the traffic broadcast to the QRC.

An Optimized Multi-block Method for Turbulent Flows

A major part of the flow field involves no complicated turbulent behavior in many turbulent flows. In this research work, in order to reduce required memory and CPU time, the flow field was decomposed into several blocks, each block including its special turbulence. A two dimensional backward facing step was considered here. Four combinations of the Prandtl mixing length and standard k- E models were implemented as well. Computer memory and CPU time consumption in addition to numerical convergence and accuracy of the obtained results were mainly investigated. Observations showed that, a suitable combination of turbulence models in different blocks led to the results with the same accuracy as the high order turbulence model for all of the blocks, in addition to the reductions in memory and CPU time consumption.

Increase of Organization in Complex Systems

Measures of complexity and entropy have not converged to a single quantitative description of levels of organization of complex systems. The need for such a measure is increasingly necessary in all disciplines studying complex systems. To address this problem, starting from the most fundamental principle in Physics, here a new measure for quantity of organization and rate of self-organization in complex systems based on the principle of least (stationary) action is applied to a model system - the central processing unit (CPU) of computers. The quantity of organization for several generations of CPUs shows a double exponential rate of change of organization with time. The exact functional dependence has a fine, S-shaped structure, revealing some of the mechanisms of self-organization. The principle of least action helps to explain the mechanism of increase of organization through quantity accumulation and constraint and curvature minimization with an attractor, the least average sum of actions of all elements and for all motions. This approach can help describe, quantify, measure, manage, design and predict future behavior of complex systems to achieve the highest rates of self organization to improve their quality. It can be applied to other complex systems from Physics, Chemistry, Biology, Ecology, Economics, Cities, network theory and others where complex systems are present.

Semi-Lagrangian Method for Advection Equation on GPU in Unstructured R3 Mesh for Fluid Dynamics Application

Numerical integration of initial boundary problem for advection equation in 3 ℜ is considered. The method used is  conditionally stable semi-Lagrangian advection scheme with high order interpolation on unstructured mesh. In order to increase time step integration the BFECC method with limiter TVD correction is used. The method is adopted on parallel graphic processor unit environment using NVIDIA CUDA and applied in Navier-Stokes solver. It is shown that the calculation on NVIDIA GeForce 8800  GPU is 184 times faster than on one processor AMDX2 4800+ CPU. The method is extended to the incompressible fluid dynamics solver. Flow over a Cylinder for 3D case is compared to the experimental data.

MMU Simulation in Hardware Simulator Based-on State Transition Models

Embedded hardware simulator is a valuable computeraided tool for embedded application development. This paper focuses on the ARM926EJ-S MMU, builds state transition models and formally verifies critical properties for the models. The state transition models include loading instruction model, reading data model, and writing data model. The properties of the models are described by CTL specification language, and they are verified in VIS. The results obtained in VIS demonstrate that the critical properties of MMU are satisfied in the state transition models. The correct models can be used to implement the MMU component in our simulator. In the end of this paper, the experimental results show that the MMU can successfully accomplish memory access requests from CPU.

Performance Comparison of Real Time EDAC Systems for Applications On-Board Small Satellites

On-board Error Detection and Correction (EDAC) devices aim to secure data transmitted between the central processing unit (CPU) of a satellite onboard computer and its local memory. This paper presents a comparison of the performance of four low complexity EDAC techniques for application in Random Access Memories (RAMs) on-board small satellites. The performance of a newly proposed EDAC architecture is measured and compared with three different EDAC strategies, using the same FPGA technology. A statistical analysis of single-event upset (SEU) and multiple-bit upset (MBU) activity in commercial memories onboard Alsat-1 is given for a period of 8 years

Night-Time Traffic Light Detection Based On SVM with Geometric Moment Features

This paper presents an effective traffic lights detection method at the night-time. First, candidate blobs of traffic lights are extracted from RGB color image. Input image is represented on the dominant color domain by using color transform proposed by Ruta, then red and green color dominant regions are selected as candidates. After candidate blob selection, we carry out shape filter for noise reduction using information of blobs such as length, area, area of boundary box, etc. A multi-class classifier based on SVM (Support Vector Machine) applies into the candidates. Three kinds of features are used. We use basic features such as blob width, height, center coordinate, area, area of blob. Bright based stochastic features are also used. In particular, geometric based moment-s values between candidate region and adjacent region are proposed and used to improve the detection performance. The proposed system is implemented on Intel Core CPU with 2.80 GHz and 4 GB RAM and tested with the urban and rural road videos. Through the test, we show that the proposed method using PF, BMF, and GMF reaches up to 93 % of detection rate with computation time of in average 15 ms/frame.

Application of Formal Methods for Designing a Separation Kernel for Embedded Systems

A separation-kernel-based operating system (OS) has been designed for use in secure embedded systems by applying formal methods to the design of the separation-kernel part. The separation kernel is a small OS kernel that provides an abstract distributed environment on a single CPU. The design of the separation kernel was verified using two formal methods, the B method and the Spin model checker. A newly designed semi-formal method, the extended state transition method, was also applied. An OS comprising the separation-kernel part and additional OS services on top of the separation kernel was prototyped on the Intel IA-32 architecture. Developing and testing of a prototype embedded application, a point-of-sale application, on the prototype OS demonstrated that the proposed architecture and the use of formal methods to design its kernel part are effective for achieving a secure embedded system having a high-assurance separation kernel.

Approaches and Schemes for Storing DTD-Independent XML Data in Relational Databases

The volume of XML data exchange is explosively increasing, and the need for efficient mechanisms of XML data management is vital. Many XML storage models have been proposed for storing XML DTD-independent documents in relational database systems. Benchmarking is the best way to highlight pros and cons of different approaches. In this study, we use a common benchmarking scheme, known as XMark to compare the most cited and newly proposed DTD-independent methods in terms of logical reads, physical I/O, CPU time and duration. We show the effect of Label Path, extracting values and storing in another table and type of join needed for each method's query answering.

Real-time Interactive Ocean Wave Simulation using Multithread

This research simulates one of the natural phenomena, the ocean wave. Our goal is to be able to simulate the ocean wave at real-time rate with the water surface interacting with objects. The wave in this research is calm and smooth caused by the force of the wind above the ocean surface. In order to make the simulation of the wave real-time, the implementation of the GPU and the multithreading techniques are used here. Based on the fact that the new generation CPUs, for personal computers, have multi cores, they are useful for the multithread. This technique utilizes more than one core at a time. This simulation is programmed by C language with OpenGL. To make the simulation of the wave look more realistic, we applied an OpenGL technique called cube mapping (environmental mapping) to make water surface reflective and more realistic.