Abstract: Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.