A Low-Area Fully-Reconfigurable Hardware Design of Fast Fourier Transform System for 3GPP-LTE Standard

This paper presents a low-area and fully-reconfigurable Fast Fourier Transform (FFT) hardware design for 3GPP-LTE communication standard. It can fully support 32 different FFT sizes, up to 2048 FFT points. Besides, a special processing element is developed for making reconfigurable computing characteristics possible, while first-in first-out (FIFO) scheduling scheme design technique is proposed for hardware-friendly FIFO resource arranging. In a synthesis chip realization via TSMC 40 nm CMOS technology, the hardware circuit only occupies core area of 0.2325 mm2 and dissipates 233.5 mW at maximal operating frequency of 250 MHz.

Performance Comparison between Conventional and Flexible Box Erecting Machines Using Dispatching Rules

In this paper, we introduce a flexible box erecting machine (BEM) that swiftly and automatically transforms cardboard into a three dimensional box. Recently, the parcel service and home-shopping industries have grown rapidly, and there is an increasing need for various box types to ship various products. However, workers cannot fold thousands of boxes manually in a day. As such, automatic BEMs are garnering greater attention. This study takes equipment operation into consideration as well as mechanical improvements in order to design a BEM that is able to outperform its conventional counterparts. We analyzed six dispatching rules – First In First Out (FIFO), Shortest Processing Time (SPT), Earliest Due Date (EDD), Setup Avoidance, EDD + SPT, and EDD + Setup Avoidance – to determine which one was most suitable for BEM operation. Consequently, SPT and Setup Avoidance were found to be the most critical rules, followed by EDD + Setup Avoidance, EDD + SPT, EDD, and FIFO. This hierarchy was valid for both our conventional BEM and our new flexible BEM from the viewpoint of processing time. We believe that this research can contribute to flexible BEM management, which has the potential to increase productivity and convenience.

A High Level Implementation of a High Performance Data Transfer Interface for NoC

The distribution of a single global clock across a chip has become the major design bottleneck for high performance VLSI systems owing to the power dissipation, process variability and multicycle cross-chip signaling. A Network-on-Chip (NoC) architecture partitioned into several synchronous blocks has become a promising approach for attaining fine-grain power management at the system level. In a NoC architecture the communication between the blocks is handled asynchronously. To interface these blocks on a chip operating at different frequencies, an asynchronous FIFO interface is inevitable. However, these asynchronous FIFOs are not required if adjacent blocks belong to the same clock domain. In this paper, we have designed and analyzed a 16-bit asynchronous micropipelined FIFO of depth four, with the awareness of place and route on an FPGA device. We have used a commercially available Spartan 3 device and designed a high speed implementation of the asynchronous 4-phase micropipeline. The asynchronous FIFO implemented on the FPGA device shows 76 Mb/s throughput and a handshake cycle of 109 ns for write and 101.3 ns for read at the simulation under the worst case operating conditions (voltage = 0.95V) on a working chip at the room temperature.

Synthesis and Simulation of Enhanced Buffer Router vs. Virtual Channel Router in NOC ON Cadence

This paper presents a synthesis and simulation of proposed enhanced buffer. The design provides advantages of both buffer and bufferless network for that two cross bar switches are used. The concept of virtual channel (VC) is eliminated from the previous design by using an efficient flow-control scheme that uses the storage already present in pipelined channels in place of explicit input VCBs. This can be addressed by providing enhanced buffers on the bufferless link and creating two virtual networks. With this approach, VCBs act as distributed FIFO buffers. Without VCBs or VCs, deadlock prevention is achieved by duplicating physical channels. An enhanced buffer provides a function of hand shaking by providing a ready valid handshake signal and two bit storage. Through this design the power is reduced to 15.65% and delay is reduced to 97.88% with respect to virtual channel router.

Efficient Hardware Architecture of the Direct 2- D Transform for the HEVC Standard

This paper presents the hardware design of a unified architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional (2-D) transform for the HEVC standard. This architecture is based on fast integer transform algorithms. It is designed only with adders and shifts in order to reduce the hardware cost significantly. The goal is to ensure the maximum circuit reuse during the computing while saving 40% for the number of operations. The architecture is developed using FIFOs to compute the second dimension. The proposed hardware was implemented in VHDL. The VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA. The number of cycles in this architecture varies from 33 in 4-point- 2D-DCT to 172 when the 16-point-2D-DCT is computed. Results show frequency improvements reaching 96% when compared to an architecture described as the direct transcription of the algorithm.

Pseudo Last Useful Instant Queuing Strategy for Handovers in Low Earth Orbit Mobile Satellite Networks

This paper presents an alternative strategy of queuing handover called Pseudo Last Useful Instant PLUI scheme for Low Earth Orbit Mobile Satellite Systems LEO MSSs. The PLUI scheme uses the same approach as the Last Useful Instant LUI scheme previously proposed in literature, with less complex implementation. Simulation tests were carried out using Dynamic Channel Allocation DCA in order to evaluate the performance of this scheme and also an analytical approach has been presented to allow the performance evaluation of Fixed Channel Allocation FCA, with different handover queuing disciplines. The results show that performances achieved by the proposed strategy are close to those achieved using the LUI scheme.

A Novel FIFO Design for Data Transfer in Mixed Timing Systems

In the current scenario, with the increasing integration densities, most system-on-chip designs are partitioned into multiple clock domains. In this paper, an asynchronous FIFO (First-in First-out pipeline) design is employed as a data transfer interface between two independent clock domains. Since the clocks on the either sides of the FIFO run at a different speed, the task to ensure the correct data transmission through this FIFO is manually performed. Firstly an existing asynchronous FIFO design is discussed and simulated. Gate-level simulation results depicted the flaw in existing design. In order to solve this problem, a novel modified asynchronous FIFO design is proposed. The results obtained from proposed design are in perfect accordance with theoretical expectations. The proposed asynchronous FIFO design outperforms the existing design in terms of accuracy and speed. In order to evaluate the performance of the FIFO designs presented in this paper, the circuits were implemented in 0.24µ TSMC CMOS technology and simulated at 2.5V using HSpice (© Avant! Corporation). The layout design of the proposed FIFO is also presented.

An Improvement of PDLZW implementation with a Modified WSC Updating Technique on FPGA

In this paper, an improvement of PDLZW implementation with a new dictionary updating technique is proposed. A unique dictionary is partitioned into hierarchical variable word-width dictionaries. This allows us to search through dictionaries in parallel. Moreover, the barrel shifter is adopted for loading a new input string into the shift register in order to achieve a faster speed. However, the original PDLZW uses a simple FIFO update strategy, which is not efficient. Therefore, a new window based updating technique is implemented to better classify the difference in how often each particular address in the window is referred. The freezing policy is applied to the address most often referred, which would not be updated until all the other addresses in the window have the same priority. This guarantees that the more often referred addresses would not be updated until their time comes. This updating policy leads to an improvement on the compression efficiency of the proposed algorithm while still keep the architecture low complexity and easy to implement.

Hardware Implementation of Stack-Based Replacement Algorithms

Block replacement algorithms to increase hit ratio have been extensively used in cache memory management. Among basic replacement schemes, LRU and FIFO have been shown to be effective replacement algorithms in terms of hit rates. In this paper, we introduce a flexible stack-based circuit which can be employed in hardware implementation of both LRU and FIFO policies. We propose a simple and efficient architecture such that stack-based replacement algorithms can be implemented without the drawbacks of the traditional architectures. The stack is modular and hence, a set of stack rows can be cascaded depending on the number of blocks in each cache set. Our circuit can be implemented in conjunction with the cache controller and static/dynamic memories to form a cache system. Experimental results exhibit that our proposed circuit provides an average value of 26% improvement in storage bits and its maximum operating frequency is increased by a factor of two

Modeling and Analysis of Adaptive Buffer Sharing Scheme for Consecutive Packet Loss Reduction in Broadband Networks

High speed networks provide realtime variable bit rate service with diversified traffic flow characteristics and quality requirements. The variable bit rate traffic has stringent delay and packet loss requirements. The burstiness of the correlated traffic makes dynamic buffer management highly desirable to satisfy the Quality of Service (QoS) requirements. This paper presents an algorithm for optimization of adaptive buffer allocation scheme for traffic based on loss of consecutive packets in data-stream and buffer occupancy level. Buffer is designed to allow the input traffic to be partitioned into different priority classes and based on the input traffic behavior it controls the threshold dynamically. This algorithm allows input packets to enter into buffer if its occupancy level is less than the threshold value for priority of that packet. The threshold is dynamically varied in runtime based on packet loss behavior. The simulation is run for two priority classes of the input traffic – realtime and non-realtime classes. The simulation results show that Adaptive Partial Buffer Sharing (ADPBS) has better performance than Static Partial Buffer Sharing (SPBS) and First In First Out (FIFO) queue under the same traffic conditions.