Abstract: Motion estimation occupies the heaviest computation in HEVC (high efficiency video coding). Many fast algorithms such as TZS (test zone search) have been proposed to reduce the computation. Still the huge computation of the motion estimation is a critical issue in the implementation of HEVC video codec. In this paper, motion estimator architecture with optimized number of PEs (processing element) is presented by exploiting early termination. It also reduces hardware size by exploiting parallel processing. The presented motion estimator architecture has 8 PEs, and it can efficiently perform TZS with very high utilization of PEs.
Abstract: This paper proposes and implements an core transform architecture, which is one of the major processes in HEVC video compression standard. The proposed core transform architecture is implemented with only adders and shifters instead of area-consuming multipliers. Shifters in the proposed core transform architecture are implemented in wires and multiplexers, which significantly reduces chip area. Also, it can process from 4×4 to 16×16 blocks with common hardware by reusing processing elements. Designed core transform architecture in 0.13um technology can process a 16×16 block with 2-D transform in 130 cycles, and its gate count is 101,015 gates.
Abstract: This paper presents the hardware design of a unified
architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional
(2-D) transform for the HEVC standard. This
architecture is based on fast integer transform algorithms. It is
designed only with adders and shifts in order to reduce the hardware
cost significantly. The goal is to ensure the maximum circuit reuse
during the computing while saving 40% for the number of operations.
The architecture is developed using FIFOs to compute the second
dimension. The proposed hardware was implemented in VHDL. The
VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA.
The number of cycles in this architecture varies from 33 in 4-point-
2D-DCT to 172 when the 16-point-2D-DCT is computed. Results
show frequency improvements reaching 96% when compared to an
architecture described as the direct transcription of the algorithm.