Design Techniques and Implementation of Low Power High-Throughput Discrete Wavelet Transform Tilters for JPEG 2000 Standard

In this paper, the implementation of low power, high throughput convolutional filters for the one dimensional Discrete Wavelet Transform and its inverse are presented. The analysis filters have already been used for the implementation of a high performance DWT encoder [15] with minimum memory requirements for the JPEG 2000 standard. This paper presents the design techniques and the implementation of the convolutional filters included in the JPEG2000 standard for the forward and inverse DWT for achieving low-power operation, high performance and reduced memory accesses. Moreover, they have the ability of performing progressive computations so as to minimize the buffering between the decomposition and reconstruction phases. The experimental results illustrate the filters- low power high throughput characteristics as well as their memory efficient operation.

VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High-Speed Image Computing

This paper presents a VLSI design approach of a highspeed and real-time 2-D Discrete Wavelet Transform computing. The proposed architecture, based on new and fast convolution approach, reduces the hardware complexity in addition to reduce the critical path to the multiplier delay. Furthermore, an advanced twodimensional (2-D) discrete wavelet transform (DWT) implementation, with an efficient memory area, is designed to produce one output in every clock cycle. As a result, a very highspeed is attained. The system is verified, using JPEG2000 coefficients filters, on Xilinx Virtex-II Field Programmable Gate Array (FPGA) device without accessing any external memory. The resulting computing rate is up to 270 M samples/s and the (9,7) 2-D wavelet filter uses only 18 kb of memory (16 kb of first-in-first-out memory) with 256×256 image size. In this way, the developed design requests reduced memory and provide very high-speed processing as well as high PSNR quality.

Real-Time Digital Oscilloscope Implementation in 90nm CMOS Technology FPGA

This paper describes the design of a real-time audiorange digital oscilloscope and its implementation in 90nm CMOS FPGA platform. The design consists of sample and hold circuits, A/D conversion, audio and video processing, on-chip RAM, clock generation and control logic. The design of internal blocks and modules in 90nm devices in an FPGA is elaborated. Also the key features and their implementation algorithms are presented. Finally, the timing waveforms and simulation results are put forward.

64 bit Computer Architectures for Space Applications – A study

The more recent satellite projects/programs makes extensive usage of real – time embedded systems. 16 bit processors which meet the Mil-Std-1750 standard architecture have been used in on-board systems. Most of the Space Applications have been written in ADA. From a futuristic point of view, 32 bit/ 64 bit processors are needed in the area of spacecraft computing and therefore an effort is desirable in the study and survey of 64 bit architectures for space applications. This will also result in significant technology development in terms of VLSI and software tools for ADA (as the legacy code is in ADA). There are several basic requirements for a special processor for this purpose. They include Radiation Hardened (RadHard) devices, very low power dissipation, compatibility with existing operational systems, scalable architectures for higher computational needs, reliability, higher memory and I/O bandwidth, predictability, realtime operating system and manufacturability of such processors. Further on, these may include selection of FPGA devices, selection of EDA tool chains, design flow, partitioning of the design, pin count, performance evaluation, timing analysis etc. This project deals with a brief study of 32 and 64 bit processors readily available in the market and designing/ fabricating a 64 bit RISC processor named RISC MicroProcessor with added functionalities of an extended double precision floating point unit and a 32 bit signal processing unit acting as co-processors. In this paper, we emphasize the ease and importance of using Open Core (OpenSparc T1 Verilog RTL) and Open “Source" EDA tools such as Icarus to develop FPGA based prototypes quickly. Commercial tools such as Xilinx ISE for Synthesis are also used when appropriate.

Motion Area Estimated Motion Estimation with Triplet Search Patterns for H.264/AVC

In this paper a fast motion estimation method for H.264/AVC named Triplet Search Motion Estimation (TS-ME) is proposed. Similar to some of the traditional fast motion estimation methods and their improved proposals which restrict the search points only to some selected candidates to decrease the computation complexity, proposed algorithm separate the motion search process to several steps but with some new features. First, proposed algorithm try to search the real motion area using proposed triplet patterns instead of some selected search points to avoid dropping into the local minimum. Then, in the localized motion area a novel 3-step motion search algorithm is performed. Proposed search patterns are categorized into three rings on the basis of the distance from the search center. These three rings are adaptively selected by referencing the surrounding motion vectors to early terminate the motion search process. On the other hand, computation reduction for sub pixel motion search is also discussed considering the appearance probability of the sub pixel motion vector. From the simulation results, motion estimation speed improved by a factor of up to 38 when using proposed algorithm than that of the reference software of H.264/AVC with ignorable picture quality loss.

Accurate Crosstalk Analysis for RLC On-Chip VLSI Interconnect

This work proposes an accurate crosstalk noise estimation method in the presence of multiple RLC lines for the use in design automation tools. This method correctly models the loading effects of non switching aggressors and aggressor tree branches using resistive shielding effect and realistic exponential input waveforms. Noise peak and width expressions have been derived. The results obtained are at good agreement with SPICE results. Results show that average error for noise peak is 4.7% and for the width is 6.15% while allowing a very fast analysis.

LFSR Counter Implementation in CMOS VLSI

As chip manufacturing technology is suddenly on the threshold of major evaluation, which shrinks chip in size and performance, LFSR (Linear Feedback Shift Register) is implemented in layout level which develops the low power consumption chip, using recent CMOS, sub-micrometer layout tools. Thus LFSR counter can be a new trend setter in cryptography and is also beneficial as compared to GRAY & BINARY counter and variety of other applications. This paper compares 3 architectures in terms of the hardware implementation, CMOS layout and power consumption, using Microwind CMOS layout tool. Thus it provides solution to a low power architecture implementation of LFSR in CMOS VLSI.

Comparative Study of Evolutionary Model and Clustering Methods in Circuit Partitioning Pertaining to VLSI Design

Partitioning is a critical area of VLSI CAD. In order to build complex digital logic circuits its often essential to sub-divide multi -million transistor design into manageable Pieces. This paper looks at the various partitioning techniques aspects of VLSI CAD, targeted at various applications. We proposed an evolutionary time-series model and a statistical glitch prediction system using a neural network with selection of global feature by making use of clustering method model, for partitioning a circuit. For evolutionary time-series model, we made use of genetic, memetic & neuro-memetic techniques. Our work focused in use of clustering methods - K-means & EM methodology. A comparative study is provided for all techniques to solve the problem of circuit partitioning pertaining to VLSI design. The performance of all approaches is compared using benchmark data provided by MCNC standard cell placement benchmark net lists. Analysis of the investigational results proved that the Neuro-memetic model achieves greater performance then other model in recognizing sub-circuits with minimum amount of interconnections between them.

Fast and Efficient On-Chip Interconnection Modeling for High Speed VLSI Systems

Timing driven physical design, synthesis, and optimization tools need efficient closed-form delay models for estimating the delay associated with each net in an integrated circuit (IC) design. The total number of nets in a modern IC design has increased dramatically and exceeded millions. Therefore efficient modeling of interconnection is needed for high speed IC-s. This paper presents closed–form expressions for RC and RLC interconnection trees in current mode signaling, which can be implemented in VLSI design tool. These analytical model expressions can be used for accurate calculation of delay after the design clock tree has been laid out and the design is fully routed. Evaluation of these analytical models is several orders of magnitude faster than simulation using SPICE.

Pulsed Multi-Layered Image Filtering: A VLSI Implementation

Image convolution similar to the receptive fields found in mammalian visual pathways has long been used in conventional image processing in the form of Gabor masks. However, no VLSI implementation of parallel, multi-layered pulsed processing has been brought forward which would emulate this property. We present a technical realization of such a pulsed image processing scheme. The discussed IC also serves as a general testbed for VLSI-based pulsed information processing, which is of interest especially with regard to the robustness of representing an analog signal in the phase or duration of a pulsed, quasi-digital signal, as well as the possibility of direct digital manipulation of such an analog signal. The network connectivity and processing properties are reconfigurable so as to allow adaptation to various processing tasks.

Adaptive Distributed Genetic Algorithms and Its VLSI Design

This paper presents a dynamic adaptation scheme for the frequency of inter-deme migration in distributed genetic algorithms (GA), and its VLSI hardware design. Distributed GA, or multi-deme-based GA, uses multiple populations which evolve concurrently. The purpose of dynamic adaptation is to improve convergence performance so as to obtain better solutions. Through simulation experiments, we proved that our scheme achieves better performance than fixed frequency migration schemes.

Closed form Delay Model for on-Chip VLSIRLCG Interconnects for Ramp Input for Different Damping Conditions

Fast delay estimation methods, as opposed to simulation techniques, are needed for incremental performance driven layout synthesis. On-chip inductive effects are becoming predominant in deep submicron interconnects due to increasing clock speed and circuit complexity. Inductance causes noise in signal waveforms, which can adversely affect the performance of the circuit and signal integrity. Several approaches have been put forward which consider the inductance for on-chip interconnect modelling. But for even much higher frequency, of the order of few GHz, the shunt dielectric lossy component has become comparable to that of other electrical parameters for high speed VLSI design. In order to cope up with this effect, on-chip interconnect has to be modelled as distributed RLCG line. Elmore delay based methods, although efficient, cannot accurately estimate the delay for RLCG interconnect line. In this paper, an accurate analytical delay model has been derived, based on first and second moments of RLCG interconnection lines. The proposed model considers both the effect of inductance and conductance matrices. We have performed the simulation in 0.18μm technology node and an error of as low as less as 5% has been achieved with the proposed model when compared to SPICE. The importance of the conductance matrices in interconnect modelling has also been discussed and it is shown that if G is neglected for interconnect line modelling, then it will result an delay error of as high as 6% when compared to SPICE.