Performance Improvements of DSP Applications on a Generic Reconfigurable Platform
Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.
[1] R. Hartenstein, "A Decade of Reconfigurable Computing: A Visionary
Retrospective", in Proc. of ACM/IEEE DATE -01, pp. 642-649, 2001.
[2] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim,
M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal.
"Baring it all to software: RAW machines", in IEEE Computer, vol. 30,
no. 9, pp. 86-93, Sept. 1997.
[3] T. Miyamori and K. Olukutun, "REMARC: Reconfigurable Multimedia
Array Coprocessor", in IEICE Trans. on Information and Systems, vol.
E82-D, no. 2, pp. 389-397, Feb. 1999.
[4] H. Singh, M.-H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh, E.M. Chaves
Filho, "MorphoSys: An Integrated Reconfigurable System for Data-
Parallel and Communication-Intensive Applications", in IEEE Trans. on
Computers, vol. 49, no. 5, pp. 465-481, May 2000.
[5] Morpho Reconfigurable DSP (rDSP) IP core, Morpho Technologies,
www.morphotech.com, 2005.
[6] V. Baumgarte, G. Ehlers, F. May, A. Nuckel, M. Vorbach, M. Weinhardt,
"PACT XPP - A Self-Reconfigurable Data Processing Architecture", in
the Journal of Supercomputing, Springer, vol. 26, no. 2, pp. 167-184,
September 2003.
[7] J. Becker, M. Vorbach, "Architecture, Memory and Interface Technology
Integration of an Industrial/Academic Configurable System-on-Chip
(CSoC)", in Proc. of ISVLSI, IEEE Computer Society Press, pp. 107-112,
2003.
[8] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, R. R.
Taylor, "PipeRench: A Reconfigurable Architecture and Compiler", in
IEEE Computer, vol. 33, no. 4, pp. 70-77, April 2000.
[9] D. C. Cronquist, P. Franklin, S. G. Berg, C. Ebeling, "Specifying and
Compiling Applications for RaPiD," in Proc. of FCCM -98, pp. 116-125,
1998.
[10] N. Bansal, S. Gupta, N. Dutt, A. Nikolau, R. Gupta, "Interconnect Aware
Mapping of Applications to Coarse-Grain Reconfigurable Architectures",
in Proc. of FPL -04, pp. 891-899, 2004.
[11] J. Lee, K. Choi, N. D. Dutt, "Compilation Approach for Coarse-Grained
Reconfigurable Architectures", in IEEE Design & Test of Computers, vol.
20, no. 1, pp. 26-33, Jan.-Feb., 2003.
[12] G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, W. Bohm
and J. Hammes, "Automatic Compilation to a Coarse-Grained
Reconfigurable System-on-Chip", in ACM Transactions on Embedded
Computing Systems, vol. 2, no. 4, pp 560-589, Nov. 2003.
[13] Y. Kim, C. Park, S. Kang, H. Song, J. Jung, K. Choi, "Design and
Evaluation of a Coarse-Grained Reconfigurable Architecture", in Proc. of
ISOCC -04, pp. 227-230, 2004.
[14] B. Mei, S. Vernalde, D. Verkest, R. Lauwereins, "Mapping methodology
for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture, A Case
Study", in Proc. of ACM/IEEE DATE -04, pp. 1224-1229, 2004.
[15] F.-J. Veredas, M. Scheppler, W. Moffat, B. Mei, "Custom
implementation of the Coarse-Grained Reconfigurable ADRES
architecture for Multimedia purposes", in Proc. of FPL -05, pp. 106-111,
2005.
[16] ARM Corp., www.arm.com, 2005.
[17] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-
Hill, 1994.
[18] SUIF2 compiler infrastructure,
http://suif.stanford.edu/suif/suif2/index.html, 2005.
[19] M. D. Smith and G. Holloway, "An Introduction to Machine SUIF and
its Portable Libraries for Analysis and Optimization", Technical Report,
Harvard University, 2002.
http://www.eecs.harvard.edu/hube/research/machsuif.html.
[20] K. Kennedy and R. Allen, "Optimizing compilers for modern
architectures", Morgan Kauffman Publishers, 2002.
[21] J.W. Crenshaw, "MATH Toolkit for Real-Time Programming", CMP
Books, 2000.
[22] S. Kumar, L. Pires, S. Ponnuswamy, C. Nanavati, J. Golusky, M. Vojta,
S. Wadi, D. Pandalai, H. Spaanenberg, "A Benchmark Suite for
Evaluating Configurable Computing Systems - Status, Reflections, and
Future directions", in Proc. of FPGA, pp. 126-134, 2000.
[23] M. Bister, Y. Taeymans, J. Cornelis, "Automatic Segmentation of
Cardiac MR Images", in Proc. of Computers in Cardiology, IEEE
Computer Society Press, pp.215-218, 1989.
[24] SimpleScalar LLC, http://www.simplescalar.com, 2005.
[1] R. Hartenstein, "A Decade of Reconfigurable Computing: A Visionary
Retrospective", in Proc. of ACM/IEEE DATE -01, pp. 642-649, 2001.
[2] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim,
M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal.
"Baring it all to software: RAW machines", in IEEE Computer, vol. 30,
no. 9, pp. 86-93, Sept. 1997.
[3] T. Miyamori and K. Olukutun, "REMARC: Reconfigurable Multimedia
Array Coprocessor", in IEICE Trans. on Information and Systems, vol.
E82-D, no. 2, pp. 389-397, Feb. 1999.
[4] H. Singh, M.-H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh, E.M. Chaves
Filho, "MorphoSys: An Integrated Reconfigurable System for Data-
Parallel and Communication-Intensive Applications", in IEEE Trans. on
Computers, vol. 49, no. 5, pp. 465-481, May 2000.
[5] Morpho Reconfigurable DSP (rDSP) IP core, Morpho Technologies,
www.morphotech.com, 2005.
[6] V. Baumgarte, G. Ehlers, F. May, A. Nuckel, M. Vorbach, M. Weinhardt,
"PACT XPP - A Self-Reconfigurable Data Processing Architecture", in
the Journal of Supercomputing, Springer, vol. 26, no. 2, pp. 167-184,
September 2003.
[7] J. Becker, M. Vorbach, "Architecture, Memory and Interface Technology
Integration of an Industrial/Academic Configurable System-on-Chip
(CSoC)", in Proc. of ISVLSI, IEEE Computer Society Press, pp. 107-112,
2003.
[8] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, R. R.
Taylor, "PipeRench: A Reconfigurable Architecture and Compiler", in
IEEE Computer, vol. 33, no. 4, pp. 70-77, April 2000.
[9] D. C. Cronquist, P. Franklin, S. G. Berg, C. Ebeling, "Specifying and
Compiling Applications for RaPiD," in Proc. of FCCM -98, pp. 116-125,
1998.
[10] N. Bansal, S. Gupta, N. Dutt, A. Nikolau, R. Gupta, "Interconnect Aware
Mapping of Applications to Coarse-Grain Reconfigurable Architectures",
in Proc. of FPL -04, pp. 891-899, 2004.
[11] J. Lee, K. Choi, N. D. Dutt, "Compilation Approach for Coarse-Grained
Reconfigurable Architectures", in IEEE Design & Test of Computers, vol.
20, no. 1, pp. 26-33, Jan.-Feb., 2003.
[12] G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, W. Bohm
and J. Hammes, "Automatic Compilation to a Coarse-Grained
Reconfigurable System-on-Chip", in ACM Transactions on Embedded
Computing Systems, vol. 2, no. 4, pp 560-589, Nov. 2003.
[13] Y. Kim, C. Park, S. Kang, H. Song, J. Jung, K. Choi, "Design and
Evaluation of a Coarse-Grained Reconfigurable Architecture", in Proc. of
ISOCC -04, pp. 227-230, 2004.
[14] B. Mei, S. Vernalde, D. Verkest, R. Lauwereins, "Mapping methodology
for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture, A Case
Study", in Proc. of ACM/IEEE DATE -04, pp. 1224-1229, 2004.
[15] F.-J. Veredas, M. Scheppler, W. Moffat, B. Mei, "Custom
implementation of the Coarse-Grained Reconfigurable ADRES
architecture for Multimedia purposes", in Proc. of FPL -05, pp. 106-111,
2005.
[16] ARM Corp., www.arm.com, 2005.
[17] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-
Hill, 1994.
[18] SUIF2 compiler infrastructure,
http://suif.stanford.edu/suif/suif2/index.html, 2005.
[19] M. D. Smith and G. Holloway, "An Introduction to Machine SUIF and
its Portable Libraries for Analysis and Optimization", Technical Report,
Harvard University, 2002.
http://www.eecs.harvard.edu/hube/research/machsuif.html.
[20] K. Kennedy and R. Allen, "Optimizing compilers for modern
architectures", Morgan Kauffman Publishers, 2002.
[21] J.W. Crenshaw, "MATH Toolkit for Real-Time Programming", CMP
Books, 2000.
[22] S. Kumar, L. Pires, S. Ponnuswamy, C. Nanavati, J. Golusky, M. Vojta,
S. Wadi, D. Pandalai, H. Spaanenberg, "A Benchmark Suite for
Evaluating Configurable Computing Systems - Status, Reflections, and
Future directions", in Proc. of FPGA, pp. 126-134, 2000.
[23] M. Bister, Y. Taeymans, J. Cornelis, "Automatic Segmentation of
Cardiac MR Images", in Proc. of Computers in Cardiology, IEEE
Computer Society Press, pp.215-218, 1989.
[24] SimpleScalar LLC, http://www.simplescalar.com, 2005.
@article{"International Journal of Electrical, Electronic and Communication Sciences:52676", author = "Michalis D. Galanis and Gregory Dimitroulakos and Costas E. Goutis", title = "Performance Improvements of DSP Applications on a Generic Reconfigurable Platform", abstract = "Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.", keywords = "Reconfigurable computing, Coarse-grained
reconfigurable array, Embedded systems, DSP, Performance", volume = "1", number = "6", pages = "811-8", }