Abstract: Real time image and video processing is a demand in
many computer vision applications, e.g. video surveillance, traffic
management and medical imaging. The processing of those video
applications requires high computational power. Thus, the optimal
solution is the collaboration of CPU and hardware accelerators. In
this paper, a Canny edge detection hardware accelerator is proposed.
Edge detection is one of the basic building blocks of video and image
processing applications. It is a common block in the pre-processing
phase of image and video processing pipeline. Our presented
approach targets offloading the Canny edge detection algorithm from
processing system (PS) to programmable logic (PL) taking the
advantage of High Level Synthesis (HLS) tool flow to accelerate the
implementation on Zynq platform. The resulting implementation
enables up to a 100x performance improvement through hardware
acceleration. The CPU utilization drops down and the frame rate
jumps to 60 fps of 1080p full HD input video stream.
Abstract: High level synthesis (HLS) is a process which
generates register-transfer level design for digital systems from
behavioral description. There are many HLS algorithms and
commercial tools. However, most of these algorithms consider a
behavioral description for the system when a single token is
presented to the system. This approach does not exploit extra
hardware efficiently, especially in the design of digital filters where
common operations may exist between successive tokens. In this
paper, we modify the behavioral description to process multiple
tokens in parallel. However, this approach is unlike the full
processing that requires full hardware replication. It exploits the
presence of common operations between successive tokens. The
performance of the proposed approach is better than sequential
processing and approaches that of full parallel processing as the
hardware resources are increased.
Abstract: Streaming Applications usually run in parallel or in
series that incrementally transform a stream of input data. It poses a
design challenge to break such an application into distinguishable
blocks and then to map them into independent hardware processing
elements. For this, there is required a generic controller that
automatically maps such a stream of data into independent processing
elements without any dependencies and manual considerations. In
this paper, Kahn Process Networks (KPN) for such streaming
applications is designed and developed that will be mapped on
MPSoC. This is designed in such a way that there is a generic Cbased
compiler that will take the mapping specifications as an input
from the user and then it will automate these design constraints and
automatically generate the synthesized RTL optimized code for
specified application.