A High Level Implementation of a High Performance Data Transfer Interface for NoC

The distribution of a single global clock across a chip has become the major design bottleneck for high performance VLSI systems owing to the power dissipation, process variability and multicycle cross-chip signaling. A Network-on-Chip (NoC) architecture partitioned into several synchronous blocks has become a promising approach for attaining fine-grain power management at the system level. In a NoC architecture the communication between the blocks is handled asynchronously. To interface these blocks on a chip operating at different frequencies, an asynchronous FIFO interface is inevitable. However, these asynchronous FIFOs are not required if adjacent blocks belong to the same clock domain. In this paper, we have designed and analyzed a 16-bit asynchronous micropipelined FIFO of depth four, with the awareness of place and route on an FPGA device. We have used a commercially available Spartan 3 device and designed a high speed implementation of the asynchronous 4-phase micropipeline. The asynchronous FIFO implemented on the FPGA device shows 76 Mb/s throughput and a handshake cycle of 109 ns for write and 101.3 ns for read at the simulation under the worst case operating conditions (voltage = 0.95V) on a working chip at the room temperature.

Efficient Hardware Architecture of the Direct 2- D Transform for the HEVC Standard

This paper presents the hardware design of a unified architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional (2-D) transform for the HEVC standard. This architecture is based on fast integer transform algorithms. It is designed only with adders and shifts in order to reduce the hardware cost significantly. The goal is to ensure the maximum circuit reuse during the computing while saving 40% for the number of operations. The architecture is developed using FIFOs to compute the second dimension. The proposed hardware was implemented in VHDL. The VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA. The number of cycles in this architecture varies from 33 in 4-point- 2D-DCT to 172 when the 16-point-2D-DCT is computed. Results show frequency improvements reaching 96% when compared to an architecture described as the direct transcription of the algorithm.

Spacecraft Neural Network Control System Design using FPGA

Designing and implementing intelligent systems has become a crucial factor for the innovation and development of better products of space technologies. A neural network is a parallel system, capable of resolving paradigms that linear computing cannot. Field programmable gate array (FPGA) is a digital device that owns reprogrammable properties and robust flexibility. For the neural network based instrument prototype in real time application, conventional specific VLSI neural chip design suffers the limitation in time and cost. With low precision artificial neural network design, FPGAs have higher speed and smaller size for real time application than the VLSI and DSP chips. So, many researchers have made great efforts on the realization of neural network (NN) using FPGA technique. In this paper, an introduction of ANN and FPGA technique are briefly shown. Also, Hardware Description Language (VHDL) code has been proposed to implement ANNs as well as to present simulation results with floating point arithmetic. Synthesis results for ANN controller are developed using Precision RTL. Proposed VHDL implementation creates a flexible, fast method and high degree of parallelism for implementing ANN. The implementation of multi-layer NN using lookup table LUT reduces the resource utilization for implementation and time for execution.

Hardware Prototyping of an Efficient Encryption Engine

An approach to develop the FPGA of a flexible key RSA encryption engine that can be used as a standard device in the secured communication system is presented. The VHDL modeling of this RSA encryption engine has the unique characteristics of supporting multiple key sizes, thus can easily be fit into the systems that require different levels of security. A simple nested loop addition and subtraction have been used in order to implement the RSA operation. This has made the processing time faster and used comparatively smaller amount of space in the FPGA. The hardware design is targeted on Altera STRATIX II device and determined that the flexible key RSA encryption engine can be best suited in the device named EP2S30F484C3. The RSA encryption implementation has made use of 13,779 units of logic elements and achieved a clock frequency of 17.77MHz. It has been verified that this RSA encryption engine can perform 32-bit, 256-bit and 1024-bit encryption operation in less than 41.585us, 531.515us and 790.61us respectively.

Efficient Hardware Realization of Truncated Multipliers using FPGA

Truncated multiplier is a good candidate for digital signal processing (DSP) applications including finite impulse response (FIR) and discrete cosine transform (DCT). Through truncated multiplier a significant reduction in Field Programmable Gate Array (FPGA) resources can be achieved. This paper presents for the first time a comparison of resource utilization of Spartan-3AN and Virtex-5 implementation of standard and truncated multipliers using Very High Speed Integrated Circuit Hardware Description Language (VHDL). The Virtex-5 FPGA shows significant improvement as compared to Spartan-3AN FPGA device. The Virtex-5 FPGA device shows better performance with a percentage ratio of number of occupied slices for standard to truncated multipliers is increased from 40% to 73.86% as compared to Spartan- 3AN is decreased from 68.75% to 58.78%. Results show that the anomaly in Spartan-3AN FPGA device average connection and maximum pin delay have been efficiently reduced in Virtex-5 FPGA device.

Asynchronous Microcontroller Simulation Model in VHDL

This article describes design of the 8-bit asynchronous microcontroller simulation model in VHDL. The model is created in ISE Foundation design tool and simulated in Modelsim tool. This model is a simple application example of asynchronous systems designed in synchronous design tools. The design process of creating asynchronous system with 4-phase bundled-data protocol and with matching delays is described in the article. The model is described in gate-level abstraction. The simulation waveform of the functional construction is the result of this article. Described construction covers only the simulation model. The next step would be creating synthesizable model to FPGA.