Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.




References:
[1] R. Hartenstein, "A Decade of Reconfigurable Computing: A Visionary
Retrospective", in Proc. of ACM/IEEE DATE -01, pp. 642-649, 2001.
[2] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim,
M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal.
"Baring it all to software: RAW machines", in IEEE Computer, vol. 30,
no. 9, pp. 86-93, Sept. 1997.
[3] T. Miyamori and K. Olukutun, "REMARC: Reconfigurable Multimedia
Array Coprocessor", in IEICE Trans. on Information and Systems, vol.
E82-D, no. 2, pp. 389-397, Feb. 1999.
[4] H. Singh, M.-H. Lee, G. Lu, F.J. Kurdahi, N. Bagherzadeh, E.M. Chaves
Filho, "MorphoSys: An Integrated Reconfigurable System for Data-
Parallel and Communication-Intensive Applications", in IEEE Trans. on
Computers, vol. 49, no. 5, pp. 465-481, May 2000.
[5] Morpho Reconfigurable DSP (rDSP) IP core, Morpho Technologies,
www.morphotech.com, 2005.
[6] V. Baumgarte, G. Ehlers, F. May, A. Nuckel, M. Vorbach, M. Weinhardt,
"PACT XPP - A Self-Reconfigurable Data Processing Architecture", in
the Journal of Supercomputing, Springer, vol. 26, no. 2, pp. 167-184,
September 2003.
[7] J. Becker, M. Vorbach, "Architecture, Memory and Interface Technology
Integration of an Industrial/Academic Configurable System-on-Chip
(CSoC)", in Proc. of ISVLSI, IEEE Computer Society Press, pp. 107-112,
2003.
[8] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, R. R.
Taylor, "PipeRench: A Reconfigurable Architecture and Compiler", in
IEEE Computer, vol. 33, no. 4, pp. 70-77, April 2000.
[9] D. C. Cronquist, P. Franklin, S. G. Berg, C. Ebeling, "Specifying and
Compiling Applications for RaPiD," in Proc. of FCCM -98, pp. 116-125,
1998.
[10] N. Bansal, S. Gupta, N. Dutt, A. Nikolau, R. Gupta, "Interconnect Aware
Mapping of Applications to Coarse-Grain Reconfigurable Architectures",
in Proc. of FPL -04, pp. 891-899, 2004.
[11] J. Lee, K. Choi, N. D. Dutt, "Compilation Approach for Coarse-Grained
Reconfigurable Architectures", in IEEE Design & Test of Computers, vol.
20, no. 1, pp. 26-33, Jan.-Feb., 2003.
[12] G. Venkataramani, W. Najjar, F. Kurdahi, N. Bagherzadeh, W. Bohm
and J. Hammes, "Automatic Compilation to a Coarse-Grained
Reconfigurable System-on-Chip", in ACM Transactions on Embedded
Computing Systems, vol. 2, no. 4, pp 560-589, Nov. 2003.
[13] Y. Kim, C. Park, S. Kang, H. Song, J. Jung, K. Choi, "Design and
Evaluation of a Coarse-Grained Reconfigurable Architecture", in Proc. of
ISOCC -04, pp. 227-230, 2004.
[14] B. Mei, S. Vernalde, D. Verkest, R. Lauwereins, "Mapping methodology
for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture, A Case
Study", in Proc. of ACM/IEEE DATE -04, pp. 1224-1229, 2004.
[15] F.-J. Veredas, M. Scheppler, W. Moffat, B. Mei, "Custom
implementation of the Coarse-Grained Reconfigurable ADRES
architecture for Multimedia purposes", in Proc. of FPL -05, pp. 106-111,
2005.
[16] ARM Corp., www.arm.com, 2005.
[17] G. De Micheli, Synthesis and Optimization of Digital Circuits, McGraw-
Hill, 1994.
[18] SUIF2 compiler infrastructure,
http://suif.stanford.edu/suif/suif2/index.html, 2005.
[19] M. D. Smith and G. Holloway, "An Introduction to Machine SUIF and
its Portable Libraries for Analysis and Optimization", Technical Report,
Harvard University, 2002.
http://www.eecs.harvard.edu/hube/research/machsuif.html.
[20] K. Kennedy and R. Allen, "Optimizing compilers for modern
architectures", Morgan Kauffman Publishers, 2002.
[21] J.W. Crenshaw, "MATH Toolkit for Real-Time Programming", CMP
Books, 2000.
[22] S. Kumar, L. Pires, S. Ponnuswamy, C. Nanavati, J. Golusky, M. Vojta,
S. Wadi, D. Pandalai, H. Spaanenberg, "A Benchmark Suite for
Evaluating Configurable Computing Systems - Status, Reflections, and
Future directions", in Proc. of FPGA, pp. 126-134, 2000.
[23] M. Bister, Y. Taeymans, J. Cornelis, "Automatic Segmentation of
Cardiac MR Images", in Proc. of Computers in Cardiology, IEEE
Computer Society Press, pp.215-218, 1989.
[24] SimpleScalar LLC, http://www.simplescalar.com, 2005.