A Pipelined FSBM Hardware Architecture for HTDV-H.26x
In MPEG and H.26x standards, to eliminate the
temporal redundancy we use motion estimation. Given that the
motion estimation stage is very complex in terms of computational
effort, a hardware implementation on a re-configurable circuit is
crucial for the requirements of different real time multimedia
applications. In this paper, we present hardware architecture for
motion estimation based on "Full Search Block Matching" (FSBM)
algorithm. This architecture presents minimum latency, maximum
throughput, full utilization of hardware resources such as embedded
memory blocks, and combining both pipelining and parallel
processing techniques. Our design is described in VHDL language,
verified by simulation and implemented in a Stratix II
EP2S130F1020C4 FPGA circuit. The experiment result show that the
optimum operating clock frequency of the proposed design is 89MHz
which achieves 160M pixels/sec.
[1] G. Robert, "Représentation et codage de séquences vidéo par
hybridation de fractales et d-éléments finis," Thèse / PhD, INPG
Grenoble, 07 December 2000.
[2] S. Roux, "Adéquation algorithme - architecture pour le traitement
multimédia embarqué," Thèse / PhD, 22 January 2002, TIMA, Institut
National Polytechnique de Grenoble - INPG.
[3] S. Wong, B. Stougie, S. Cotofana, "An Investigation on FPGA based
SAD Hardware Implementations," in Proceedings of the 13th Annual
Workshop on Circuits, Systems and Signal Processing (ProRISC2002),
pp. 568-573, Veldhoven, The Netherlands, November 2002.
[4] Ja-Ling Wu, "Motion Estimation for Video Coding Standards,"
Department of Computer Science and Information Engineering, National
Taiwan University.
[5] A. Ben Atitallah, P. Kadionik, N. Masmoudi, H. Levi "HW/SW FPGA
Architecture for a Flexible Motion Estimation," IEEE ICECS '07,
Marrakech, Morocco, 11-14, December 2007.
[6] J. Zhang, Y. He, S. Yang, and Y. Zhong, "Performance and Complexity
Joint Optimization for H.264 Video Coding," Proceedings of the 2003
International Symposium on Circuits and Systems, Vol. 2. (2003) 888-
891.
[7] C. Zhu, X. Lin, and L. P. Chau, " Hexagon-Based Search Pattern for
Fast Block Motion Estimation ," IEEE Trans. On Circuits And Systs.
For Video Technology, vol. 12, pp. 349-355, May 2002.
[8] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, "A novel
unrestricted center-biased diamond search algorithm for block motion
estimation ," IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 369-
377, Aug. 1998.
[9] L. K. Liu and E. Feig, "A block-based gradient descent search algorithm
for block motion estimation in video coding," IEEE Trans. Circuits Syst.
Video Technol., vol. 6, no. 4, pp. 419-423, Aug. 1996.
[10] A. Ben Atitallah, P. Kadionik, F. Ghozzi, P. Nouel, N. Masmoudi, Ph.
Marchegay, "Optimization and implementation on FPGA of the
DCT/IDCT algorithm ," IEEE ICASSP '06, Toulouse, France, 14-19 Mai
2006.
[11] T. Komarek, P. Pirsch, "Array architectures for block matching
algorithms," IEEE Transactions on Circuits and Systems, Vol. 36, No.
10, October 1989.
[12] K. M. Yang, M. T. Sun, L. Wu, "A family of vlsi designs for the motion
compensation block-matching algorithm," IEEE Transactions on
Circuits and Systems, Vol. 36, No. 10, October 1989.
[13] C. H. Hsieh et al., "Vlsi architecture for block-matching motion
estimation algorithm," IEEE Transactions on Circuits and Systems for
video technology, Vol. 2, No. 2, June 1992
[14] F. M. Yang, S. Wolter, R. Laur., "Parallel implementation of a blockmatching
algorithm for hdtv motion estimation," Workshop on Design
Methodologies for Microelectronicx and Signal Processing, pp. 73-80.
October 1993.
[15] Y. S. Jehng, L. G. Chen, T. D. Chiueh, "An efficient and simple vlsi tree
architecture for motion estimation Algorithms," IEEE Transactions on
signal processing, Vol. 41, No. 2, October 1993.
[16] H. Yeo et al., "A novel modular systolic array architecture for full-search
block matching motion estimation," IEEE Transactions on Circuits and
Systems for video technology, Vol. No. 5, October 1995.
[1] G. Robert, "Représentation et codage de séquences vidéo par
hybridation de fractales et d-éléments finis," Thèse / PhD, INPG
Grenoble, 07 December 2000.
[2] S. Roux, "Adéquation algorithme - architecture pour le traitement
multimédia embarqué," Thèse / PhD, 22 January 2002, TIMA, Institut
National Polytechnique de Grenoble - INPG.
[3] S. Wong, B. Stougie, S. Cotofana, "An Investigation on FPGA based
SAD Hardware Implementations," in Proceedings of the 13th Annual
Workshop on Circuits, Systems and Signal Processing (ProRISC2002),
pp. 568-573, Veldhoven, The Netherlands, November 2002.
[4] Ja-Ling Wu, "Motion Estimation for Video Coding Standards,"
Department of Computer Science and Information Engineering, National
Taiwan University.
[5] A. Ben Atitallah, P. Kadionik, N. Masmoudi, H. Levi "HW/SW FPGA
Architecture for a Flexible Motion Estimation," IEEE ICECS '07,
Marrakech, Morocco, 11-14, December 2007.
[6] J. Zhang, Y. He, S. Yang, and Y. Zhong, "Performance and Complexity
Joint Optimization for H.264 Video Coding," Proceedings of the 2003
International Symposium on Circuits and Systems, Vol. 2. (2003) 888-
891.
[7] C. Zhu, X. Lin, and L. P. Chau, " Hexagon-Based Search Pattern for
Fast Block Motion Estimation ," IEEE Trans. On Circuits And Systs.
For Video Technology, vol. 12, pp. 349-355, May 2002.
[8] J. Y. Tham, S. Ranganath, M. Ranganath, and A. A. Kassim, "A novel
unrestricted center-biased diamond search algorithm for block motion
estimation ," IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. 369-
377, Aug. 1998.
[9] L. K. Liu and E. Feig, "A block-based gradient descent search algorithm
for block motion estimation in video coding," IEEE Trans. Circuits Syst.
Video Technol., vol. 6, no. 4, pp. 419-423, Aug. 1996.
[10] A. Ben Atitallah, P. Kadionik, F. Ghozzi, P. Nouel, N. Masmoudi, Ph.
Marchegay, "Optimization and implementation on FPGA of the
DCT/IDCT algorithm ," IEEE ICASSP '06, Toulouse, France, 14-19 Mai
2006.
[11] T. Komarek, P. Pirsch, "Array architectures for block matching
algorithms," IEEE Transactions on Circuits and Systems, Vol. 36, No.
10, October 1989.
[12] K. M. Yang, M. T. Sun, L. Wu, "A family of vlsi designs for the motion
compensation block-matching algorithm," IEEE Transactions on
Circuits and Systems, Vol. 36, No. 10, October 1989.
[13] C. H. Hsieh et al., "Vlsi architecture for block-matching motion
estimation algorithm," IEEE Transactions on Circuits and Systems for
video technology, Vol. 2, No. 2, June 1992
[14] F. M. Yang, S. Wolter, R. Laur., "Parallel implementation of a blockmatching
algorithm for hdtv motion estimation," Workshop on Design
Methodologies for Microelectronicx and Signal Processing, pp. 73-80.
October 1993.
[15] Y. S. Jehng, L. G. Chen, T. D. Chiueh, "An efficient and simple vlsi tree
architecture for motion estimation Algorithms," IEEE Transactions on
signal processing, Vol. 41, No. 2, October 1993.
[16] H. Yeo et al., "A novel modular systolic array architecture for full-search
block matching motion estimation," IEEE Transactions on Circuits and
Systems for video technology, Vol. No. 5, October 1995.
@article{"International Journal of Electrical, Electronic and Communication Sciences:54981", author = "H. Loukil and A. Ben Atitallah and F. Ghozzi and M. A. Ben Ayed and N. Masmoudi", title = "A Pipelined FSBM Hardware Architecture for HTDV-H.26x", abstract = "In MPEG and H.26x standards, to eliminate the
temporal redundancy we use motion estimation. Given that the
motion estimation stage is very complex in terms of computational
effort, a hardware implementation on a re-configurable circuit is
crucial for the requirements of different real time multimedia
applications. In this paper, we present hardware architecture for
motion estimation based on "Full Search Block Matching" (FSBM)
algorithm. This architecture presents minimum latency, maximum
throughput, full utilization of hardware resources such as embedded
memory blocks, and combining both pipelining and parallel
processing techniques. Our design is described in VHDL language,
verified by simulation and implemented in a Stratix II
EP2S130F1020C4 FPGA circuit. The experiment result show that the
optimum operating clock frequency of the proposed design is 89MHz
which achieves 160M pixels/sec.", keywords = "SAD, FSBM, Hardware Implementation, FPGA.", volume = "2", number = "10", pages = "2235-8", }