Abstract: The Versatile Video Coding standard (VVC) is actually under development by the Joint Video Exploration Team (or JVET). An Adaptive Multiple Transforms (AMT) approach was announced. It is based on different transform modules that provided an efficient coding. However, the AMT solution raises several issues especially regarding the complexity of the selected set of transforms. This can be an important issue, particularly for a future industrial adoption. This paper proposed an efficient hardware implementation of the most used transform in AMT approach: the DCT II. The developed circuit is adapted to different block sizes and can reach a minimum frequency of 192 MHz allowing an optimized execution time.
Abstract: Nowadays, demand for using real-time video transmission capable devices is ever-increasing. So, high resolution videos have made efficient video compression techniques an essential component for capturing and transmitting video data. Motion estimation has a critical role in encoding raw video. Hence, various motion estimation methods are introduced to efficiently compress the video. Low bit‑depth representation based motion estimation methods facilitate computation of matching criteria and thus, provide small hardware footprint. In this paper, a hardware implementation of a two-bit transformation based low-complexity motion estimation method using local binary pattern approach is proposed. Image frames are represented in two-bit depth instead of full-depth by making use of the local binary pattern as a binarization approach and the binarization part of the hardware architecture is explained in detail. Experimental results demonstrate the difference between the proposed hardware architecture and the architectures of well-known low-complexity motion estimation methods in terms of important aspects such as resource utilization, energy and power consumption.
Abstract: In this paper, an encryption algorithm is proposed for real-time image encryption. The scheme employs a dual chaotic generator based on a three dimensional (3D) discrete Lorenz attractor. Encryption is achieved using non-autonomous modulation where the data is injected into the dynamics of the master chaotic generator. The second generator is used to permute the dynamics of the master generator using the same approach. Since the data stream can be regarded as a random source, the resulting permutations of the generator dynamics greatly increase the security of the transmitted signal. In addition, a technique is proposed to mitigate the error propagation due to the finite precision arithmetic of digital hardware. In particular, truncation and rounding errors are eliminated by employing an integer representation of the data which can easily be implemented. The simple hardware architecture of the algorithm makes it suitable for secure real-time applications.
Abstract: The Simulation based VLSI Implementation of
FELICS (Fast Efficient Lossless Image Compression System)
Algorithm is proposed to provide the lossless image compression and
is implemented in simulation oriented VLSI (Very Large Scale
Integrated). To analysis the performance of Lossless image
compression and to reduce the image without losing image quality
and then implemented in VLSI based FELICS algorithm. In FELICS
algorithm, which consists of simplified adjusted binary code for
Image compression and these compression image is converted in
pixel and then implemented in VLSI domain. This parameter is used
to achieve high processing speed and minimize the area and power.
The simplified adjusted binary code reduces the number of arithmetic
operation and achieved high processing speed. The color difference
preprocessing is also proposed to improve coding efficiency with
simple arithmetic operation. Although VLSI based FELICS
Algorithm provides effective solution for hardware architecture
design for regular pipelining data flow parallelism with four stages.
With two level parallelisms, consecutive pixels can be classified into
even and odd samples and the individual hardware engine is
dedicated for each one. This method can be further enhanced by
multilevel parallelisms.
Abstract: This paper presents the hardware design of a unified
architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional
(2-D) transform for the HEVC standard. This
architecture is based on fast integer transform algorithms. It is
designed only with adders and shifts in order to reduce the hardware
cost significantly. The goal is to ensure the maximum circuit reuse
during the computing while saving 40% for the number of operations.
The architecture is developed using FIFOs to compute the second
dimension. The proposed hardware was implemented in VHDL. The
VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA.
The number of cycles in this architecture varies from 33 in 4-point-
2D-DCT to 172 when the 16-point-2D-DCT is computed. Results
show frequency improvements reaching 96% when compared to an
architecture described as the direct transcription of the algorithm.
Abstract: Recently, much research has been conducted for
security for wireless sensor networks and ubiquitous computing.
Security issues such as authentication and data integrity are major
requirements to construct sensor network systems. Advanced
Encryption Standard (AES) is considered as one of candidate
algorithms for data encryption in wireless sensor networks. In this
paper, we will present the hardware architecture to implement low
power AES crypto module. Our low power AES crypto module has
optimized architecture of data encryption unit and key schedule unit
which could be applicable to wireless sensor networks. We also details
low power design methods used to design our low power AES crypto
module.
Abstract: In this paper, RSA encryption algorithm and its hardware
implementation in Xilinx-s Virtex Field Programmable Gate
Arrays (FPGA) is analyzed. The issues of scalability, flexible performance,
and silicon efficiency for the hardware acceleration of
public key crypto systems are being explored in the present work.
Using techniques based on the interleaved math for exponentiation,
the proposed RSA calculation architecture is compared to existing
FPGA-based solutions for speed, FPGA utilization, and scalability.
The paper covers the RSA encryption algorithm, interleaved multiplication,
Miller Rabin algorithm for primality test, extended Euclidean
math, basic FPGA technology, and the implementation details of
the proposed RSA calculation architecture. Performance of several
alternative hardware architectures is discussed and compared. Finally,
conclusion is drawn, highlighting the advantages of a fully flexible
& parameterized design.
Abstract: In the current study we present a system that is
capable to deliver proxy based differentiated service. It will help the
carrier service node to sell a prepaid service to clients and limit the
use to a particular mobile device or devices for a certain time. The
system includes software and hardware architecture for a mobile
device with moderate computational power, and a secure protocol for
communication between it and its carrier service node. On the
carrier service node a proxy runs on a centralized server to be
capable of implementing cryptographic algorithms, while the mobile
device contains a simple embedded processor capable of executing
simple algorithms. One prerequisite is needed for the system to run
efficiently that is a presence of Global Trusted Verification Authority
(GTVA) which is equivalent to certifying authority in IP networks.
This system appears to be of great interest for many commercial
transactions, business to business electronic and mobile commerce,
and military applications.
Abstract: In Image processing the Image compression can improve
the performance of the digital systems by reducing the cost and
time in image storage and transmission without significant reduction
of the Image quality. This paper describes hardware architecture of
low complexity Discrete Cosine Transform (DCT) architecture for
image compression[6]. In this DCT architecture, common computations
are identified and shared to remove redundant computations
in DCT matrix operation. Vector processing is a method used for
implementation of DCT. This reduction in computational complexity
of 2D DCT reduces power consumption. The 2D DCT is performed
on 8x8 matrix using two 1-Dimensional Discrete cosine transform
blocks and a transposition memory [7]. Inverse discrete cosine
transform (IDCT) is performed to obtain the image matrix and
reconstruct the original image. The proposed image compression
algorithm is comprehended using MATLAB code. The VLSI design
of the architecture is implemented Using Verilog HDL. The proposed
hardware architecture for image compression employing DCT was
synthesized using RTL complier and it was mapped using 180nm
standard cells. . The Simulation is done using Modelsim. The
simulation results from MATLAB and Verilog HDL are compared.
Detailed analysis for power and area was done using RTL compiler
from CADENCE. Power consumption of DCT core is reduced to
1.027mW with minimum area[1].
Abstract: In MPEG and H.26x standards, to eliminate the
temporal redundancy we use motion estimation. Given that the
motion estimation stage is very complex in terms of computational
effort, a hardware implementation on a re-configurable circuit is
crucial for the requirements of different real time multimedia
applications. In this paper, we present hardware architecture for
motion estimation based on "Full Search Block Matching" (FSBM)
algorithm. This architecture presents minimum latency, maximum
throughput, full utilization of hardware resources such as embedded
memory blocks, and combining both pipelining and parallel
processing techniques. Our design is described in VHDL language,
verified by simulation and implemented in a Stratix II
EP2S130F1020C4 FPGA circuit. The experiment result show that the
optimum operating clock frequency of the proposed design is 89MHz
which achieves 160M pixels/sec.
Abstract: In this paper, a new reverse converter for the moduli set {2n, 2n–1, 2n–1–1} is presented. We improved a previously introduced conversion algorithm for deriving an efficient hardware design for reverse converter. Hardware architecture of the proposed converter is based on carry-save adders and regular binary adders, without the requirement for modular adders. The presented design is faster than the latest introduced reverse converter for moduli set {2n, 2n–1, 2n–1–1}. Also, it has better performance than the reverse converters for the recently introduced moduli set {2n+1–1, 2n, 2n–1}
Abstract: Falling has been one of the major concerns and threats
to the independence of the elderly in their daily lives. With the
worldwide significant growth of the aging population, it is essential
to have a promising solution of fall detection which is able to operate
at high accuracy in real-time and supports large scale implementation
using multiple cameras. Field Programmable Gate Array (FPGA) is a
highly promising tool to be used as a hardware accelerator in many
emerging embedded vision based system. Thus, it is the main
objective of this paper to present an FPGA-based solution of visual
based fall detection to meet stringent real-time requirements with
high accuracy. The hardware architecture of visual based fall
detection which utilizes the pixel locality to reduce memory accesses
is proposed. By exploiting the parallel and pipeline architecture of
FPGA, our hardware implementation of visual based fall detection
using FGPA is able to achieve a performance of 60fps for a series of
video analytical functions at VGA resolutions (640x480). The results
of this work show that FPGA has great potentials and impacts in
enabling large scale vision system in the future healthcare industry
due to its flexibility and scalability.