Abstract: In this paper, the implementation of low power,
high throughput convolutional filters for the one dimensional
Discrete Wavelet Transform and its inverse are presented. The
analysis filters have already been used for the implementation of a
high performance DWT encoder [15] with minimum memory
requirements for the JPEG 2000 standard. This paper presents the
design techniques and the implementation of the convolutional filters
included in the JPEG2000 standard for the forward and inverse DWT
for achieving low-power operation, high performance and reduced
memory accesses. Moreover, they have the ability of performing
progressive computations so as to minimize the buffering between
the decomposition and reconstruction phases. The experimental
results illustrate the filters- low power high throughput characteristics
as well as their memory efficient operation.
Abstract: Speedups from mapping four real-life DSP
applications on an embedded system-on-chip that couples coarsegrained
reconfigurable logic with an instruction-set processor are
presented. The reconfigurable logic is realized by a 2-Dimensional
Array of Processing Elements. A design flow for improving
application-s performance is proposed. Critical software parts, called
kernels, are accelerated on the Coarse-Grained Reconfigurable
Array. The kernels are detected by profiling the source code. For
mapping the detected kernels on the reconfigurable logic a prioritybased
mapping algorithm has been developed. Two 4x4 array
architectures, which differ in their interconnection structure among
the Processing Elements, are considered. The experiments for eight
different instances of a generic system show that important overall
application speedups have been reported for the four applications.
The performance improvements range from 1.86 to 3.67, with an
average value of 2.53, compared with an all-software execution.
These speedups are quite close to the maximum theoretical speedups
imposed by Amdahl-s law.