Performance Improvements of DSP Applications on a Generic Reconfigurable Platform

Speedups from mapping four real-life DSP applications on an embedded system-on-chip that couples coarsegrained reconfigurable logic with an instruction-set processor are presented. The reconfigurable logic is realized by a 2-Dimensional Array of Processing Elements. A design flow for improving application-s performance is proposed. Critical software parts, called kernels, are accelerated on the Coarse-Grained Reconfigurable Array. The kernels are detected by profiling the source code. For mapping the detected kernels on the reconfigurable logic a prioritybased mapping algorithm has been developed. Two 4x4 array architectures, which differ in their interconnection structure among the Processing Elements, are considered. The experiments for eight different instances of a generic system show that important overall application speedups have been reported for the four applications. The performance improvements range from 1.86 to 3.67, with an average value of 2.53, compared with an all-software execution. These speedups are quite close to the maximum theoretical speedups imposed by Amdahl-s law.

An On-chip LDO Voltage Regulator with Improved Current Buffer Compensation

A fully on-chip low drop-out (LDO) voltage regulator with 100pF output load capacitor is presented. A novel frequency compensation scheme using current buffer is adopted to realize single dominant pole within the unit gain frequency of the regulation loop, the phase margin (PM) is at least 50 degree under the full range of the load current, and the power supply rejection (PSR) character is improved compared with conventional Miller compensation. Besides, the differentiator provides a high speed path during the load current transient. Implemented in 0.18μm CMOS technology, the LDO voltage regulator provides 100mA load current with a stable 1.8V output voltage consuming 80μA quiescent current.

Traceable Watermarking System using SoC for Digital Cinema Delivery

As the development of digital technology is increasing, Digital cinema is getting more spread. However, content copy and attack against the digital cinema becomes a serious problem. To solve the above security problem, we propose “Additional Watermarking" for digital cinema delivery system. With this proposed “Additional watermarking" method, we protect content copyrights at encoder and user side information at decoder. It realizes the traceability of the watermark embedded at encoder. The watermark is embedded into the random-selected frames using Hash function. Using it, the embedding position is distributed by Hash Function so that third parties do not break off the watermarking algorithm. Finally, our experimental results show that proposed method is much better than the convenient watermarking techniques in terms of robustness, image quality and its simple but unbreakable algorithm.

Local Linear Model Tree (LOLIMOT) Reconfigurable Parallel Hardware

Local Linear Neuro-Fuzzy Models (LLNFM) like other neuro- fuzzy systems are adaptive networks and provide robust learning capabilities and are widely utilized in various applications such as pattern recognition, system identification, image processing and prediction. Local linear model tree (LOLIMOT) is a type of Takagi-Sugeno-Kang neuro fuzzy algorithm which has proven its efficiency compared with other neuro fuzzy networks in learning the nonlinear systems and pattern recognition. In this paper, a dedicated reconfigurable and parallel processing hardware for LOLIMOT algorithm and its applications are presented. This hardware realizes on-chip learning which gives it the capability to work as a standalone device in a system. The synthesis results on FPGA platforms show its potential to improve the speed at least 250 of times faster than software implemented algorithms.

Optimum Signal-to-noise Ratio Performance of Electron Multiplying Charge Coupled Devices

Electron multiplying charge coupled devices (EMCCDs) have revolutionized the world of low light imaging by introducing on-chip multiplication gain based on the impact ionization effect in the silicon. They combine the sub-electron readout noise with high frame rates. Signal-to-noise Ratio (SNR) is an important performance parameter for low-light-level imaging systems. This work investigates the SNR performance of an EMCCD operated in Non-inverted Mode (NIMO) and Inverted Mode (IMO). The theory of noise characteristics and operation modes is presented. The results show that the SNR of is determined by dark current and clock induced charge at high gain level. The optimum SNR performance is provided by an EMCCD operated in NIMO in short exposure and strong cooling applications. In contrast, an IMO EMCCD is preferable.

Speedup of Data Vortex Network Architecture

In this paper, 3X3 routing nodes are proposed to provide speedup and parallel processing capability in Data Vortex network architectures. The new design not only significantly improves network throughput and latency, but also eliminates the need for distributive traffic control mechanism originally embedded among nodes and the need for nodal buffering. The cost effectiveness is studied by a comparison study with the previously proposed 2- input buffered networks, and considerable performance enhancement can be achieved with similar or lower cost of hardware. Unlike previous implementation, the network leaves small probability of contention, therefore, the packet drop rate must be kept low for such implementation to be feasible and attractive, and it can be achieved with proper choice of operation conditions.

A Novel Genetic Algorithm Designed for Hardware Implementation

A new genetic algorithm, termed the 'optimum individual monogenetic genetic algorithm' (OIMGA), is presented whose properties have been deliberately designed to be well suited to hardware implementation. Specific design criteria were to ensure fast access to the individuals in the population, to keep the required silicon area for hardware implementation to a minimum and to incorporate flexibility in the structure for the targeting of a range of applications. The first two criteria are met by retaining only the current optimum individual, thereby guaranteeing a small memory requirement that can easily be stored in fast on-chip memory. Also, OIMGA can be easily reconfigured to allow the investigation of problems that normally warrant either large GA populations or individuals many genes in length. Local convergence is achieved in OIMGA by retaining elite individuals, while population diversity is ensured by continually searching for the best individuals in fresh regions of the search space. The results given in this paper demonstrate that both the performance of OIMGA and its convergence time are superior to those of a range of existing hardware GA implementations.

Low Power Bus Binding Based on Dynamic Bit Reordering

In this paper, the problem of reducing switching activity in on-chip buses at the stage of high-level synthesis is considered, and a high-level low power bus binding based on dynamic bit reordering is proposed. Whereas conventional methods use a fixed bit ordering between variables within a bus, the proposed method switches a bit ordering dynamically to obtain a switching activity reduction. As a result, the proposed method finds a binding solution with a smaller value of total switching activity (TSA). Experimental result shows that the proposed method obtains a binding solution having 12.0-34.9% smaller TSA compared with the conventional methods.

Low Jitter ADPLL based Clock Generator for High Speed SoC Applications

An efficient architecture for low jitter All Digital Phase Locked Loop (ADPLL) suitable for high speed SoC applications is presented in this paper. The ADPLL is designed using standard cells and described by Hardware Description Language (HDL). The ADPLL implemented in a 90 nm CMOS process can operate from 10 to 200 MHz and achieve worst case frequency acquisition in 14 reference clock cycles. The simulation result shows that PLL has cycle to cycle jitter of 164 ps and period jitter of 100 ps at 100MHz. Since the digitally controlled oscillator (DCO) can achieve both high resolution and wide frequency range, it can meet the demands of system-level integration. The proposed ADPLL can easily be ported to different processes in a short time. Thus, it can reduce the design time and design complexity of the ADPLL, making it very suitable for System-on-Chip (SoC) applications.

Formation of Round Channel for Microfluidic Applications

PDMS (Polydimethylsiloxane) polymer is a suitable material for biological and MEMS (Microelectromechanical systems) designers, because of its biocompatibility, transparency and high resistance under plasma treatment. PDMS round channel is always been of great interest due to its ability to confine the liquid with membrane type micro valves. In this paper we are presenting a very simple way to form round shapemicrofluidic channel, which is based on reflow of positive photoresist AZ® 40 XT. With this method, it is possible to obtain channel of different height simply by varying the spin coating parameters of photoresist.

Closed form Delay Model for on-Chip VLSIRLCG Interconnects for Ramp Input for Different Damping Conditions

Fast delay estimation methods, as opposed to simulation techniques, are needed for incremental performance driven layout synthesis. On-chip inductive effects are becoming predominant in deep submicron interconnects due to increasing clock speed and circuit complexity. Inductance causes noise in signal waveforms, which can adversely affect the performance of the circuit and signal integrity. Several approaches have been put forward which consider the inductance for on-chip interconnect modelling. But for even much higher frequency, of the order of few GHz, the shunt dielectric lossy component has become comparable to that of other electrical parameters for high speed VLSI design. In order to cope up with this effect, on-chip interconnect has to be modelled as distributed RLCG line. Elmore delay based methods, although efficient, cannot accurately estimate the delay for RLCG interconnect line. In this paper, an accurate analytical delay model has been derived, based on first and second moments of RLCG interconnection lines. The proposed model considers both the effect of inductance and conductance matrices. We have performed the simulation in 0.18μm technology node and an error of as low as less as 5% has been achieved with the proposed model when compared to SPICE. The importance of the conductance matrices in interconnect modelling has also been discussed and it is shown that if G is neglected for interconnect line modelling, then it will result an delay error of as high as 6% when compared to SPICE.