Abstract: This paper presents a low-area and fully-reconfigurable Fast Fourier Transform (FFT) hardware design for 3GPP-LTE communication standard. It can fully support 32 different FFT sizes, up to 2048 FFT points. Besides, a special processing element is developed for making reconfigurable computing characteristics possible, while first-in first-out (FIFO) scheduling scheme design technique is proposed for hardware-friendly FIFO resource arranging. In a synthesis chip realization via TSMC 40 nm CMOS technology, the hardware circuit only occupies core area of 0.2325 mm2 and dissipates 233.5 mW at maximal operating frequency of 250 MHz.
Abstract: In this paper, we introduce a flexible box erecting
machine (BEM) that swiftly and automatically transforms cardboard
into a three dimensional box. Recently, the parcel service and
home-shopping industries have grown rapidly, and there is an
increasing need for various box types to ship various products.
However, workers cannot fold thousands of boxes manually in a day.
As such, automatic BEMs are garnering greater attention. This study
takes equipment operation into consideration as well as mechanical
improvements in order to design a BEM that is able to outperform its
conventional counterparts. We analyzed six dispatching rules – First In
First Out (FIFO), Shortest Processing Time (SPT), Earliest Due Date
(EDD), Setup Avoidance, EDD + SPT, and EDD + Setup Avoidance –
to determine which one was most suitable for BEM operation.
Consequently, SPT and Setup Avoidance were found to be the most
critical rules, followed by EDD + Setup Avoidance, EDD + SPT,
EDD, and FIFO. This hierarchy was valid for both our conventional
BEM and our new flexible BEM from the viewpoint of processing
time. We believe that this research can contribute to flexible BEM
management, which has the potential to increase productivity and
convenience.
Abstract: The distribution of a single global clock across a chip
has become the major design bottleneck for high performance VLSI
systems owing to the power dissipation, process variability and multicycle
cross-chip signaling. A Network-on-Chip (NoC) architecture
partitioned into several synchronous blocks has become a promising
approach for attaining fine-grain power management at the system
level. In a NoC architecture the communication between the blocks is
handled asynchronously. To interface these blocks on a chip
operating at different frequencies, an asynchronous FIFO interface is
inevitable. However, these asynchronous FIFOs are not required if
adjacent blocks belong to the same clock domain. In this paper, we
have designed and analyzed a 16-bit asynchronous micropipelined
FIFO of depth four, with the awareness of place and route on an
FPGA device. We have used a commercially available Spartan 3
device and designed a high speed implementation of the
asynchronous 4-phase micropipeline. The asynchronous FIFO
implemented on the FPGA device shows 76 Mb/s throughput and a
handshake cycle of 109 ns for write and 101.3 ns for read at the
simulation under the worst case operating conditions (voltage =
0.95V) on a working chip at the room temperature.
Abstract: This paper presents a synthesis and simulation of proposed enhanced buffer. The design provides advantages of both buffer and bufferless network for that two cross bar switches are used. The concept of virtual channel (VC) is eliminated from the previous design by using an efficient flow-control scheme that uses the storage already present in pipelined channels in place of explicit input VCBs. This can be addressed by providing enhanced buffers on the bufferless link and creating two virtual networks. With this approach, VCBs act as distributed FIFO buffers. Without VCBs or VCs, deadlock prevention is achieved by duplicating physical channels. An enhanced buffer provides a function of hand shaking by providing a ready valid handshake signal and two bit storage. Through this design the power is reduced to 15.65% and delay is reduced to 97.88% with respect to virtual channel router.
Abstract: This paper presents the hardware design of a unified
architecture to compute the 4x4, 8x8 and 16x16 efficient twodimensional
(2-D) transform for the HEVC standard. This
architecture is based on fast integer transform algorithms. It is
designed only with adders and shifts in order to reduce the hardware
cost significantly. The goal is to ensure the maximum circuit reuse
during the computing while saving 40% for the number of operations.
The architecture is developed using FIFOs to compute the second
dimension. The proposed hardware was implemented in VHDL. The
VHDL RTL code works at 240 MHZ in an Altera Stratix III FPGA.
The number of cycles in this architecture varies from 33 in 4-point-
2D-DCT to 172 when the 16-point-2D-DCT is computed. Results
show frequency improvements reaching 96% when compared to an
architecture described as the direct transcription of the algorithm.
Abstract: This paper presents an alternative strategy of queuing
handover called Pseudo Last Useful Instant PLUI scheme for Low
Earth Orbit Mobile Satellite Systems LEO MSSs. The PLUI scheme
uses the same approach as the Last Useful Instant LUI scheme
previously proposed in literature, with less complex implementation.
Simulation tests were carried out using Dynamic Channel Allocation
DCA in order to evaluate the performance of this scheme and also an
analytical approach has been presented to allow the performance
evaluation of Fixed Channel Allocation FCA, with different
handover queuing disciplines. The results show that performances
achieved by the proposed strategy are close to those achieved using
the LUI scheme.
Abstract: In this paper, an improvement of PDLZW implementation
with a new dictionary updating technique is proposed. A
unique dictionary is partitioned into hierarchical variable word-width
dictionaries. This allows us to search through dictionaries in parallel.
Moreover, the barrel shifter is adopted for loading a new input string
into the shift register in order to achieve a faster speed. However,
the original PDLZW uses a simple FIFO update strategy, which is
not efficient. Therefore, a new window based updating technique
is implemented to better classify the difference in how often each
particular address in the window is referred. The freezing policy
is applied to the address most often referred, which would not be
updated until all the other addresses in the window have the same
priority. This guarantees that the more often referred addresses would
not be updated until their time comes. This updating policy leads
to an improvement on the compression efficiency of the proposed
algorithm while still keep the architecture low complexity and easy
to implement.
Abstract: Block replacement algorithms to increase hit ratio
have been extensively used in cache memory management. Among
basic replacement schemes, LRU and FIFO have been shown to be
effective replacement algorithms in terms of hit rates. In this paper,
we introduce a flexible stack-based circuit which can be employed in
hardware implementation of both LRU and FIFO policies. We
propose a simple and efficient architecture such that stack-based
replacement algorithms can be implemented without the drawbacks
of the traditional architectures. The stack is modular and hence, a set
of stack rows can be cascaded depending on the number of blocks in
each cache set. Our circuit can be implemented in conjunction with
the cache controller and static/dynamic memories to form a cache
system. Experimental results exhibit that our proposed circuit
provides an average value of 26% improvement in storage bits and its
maximum operating frequency is increased by a factor of two
Abstract: High speed networks provide realtime variable bit rate
service with diversified traffic flow characteristics and quality
requirements. The variable bit rate traffic has stringent delay and
packet loss requirements. The burstiness of the correlated traffic
makes dynamic buffer management highly desirable to satisfy the
Quality of Service (QoS) requirements. This paper presents an
algorithm for optimization of adaptive buffer allocation scheme for
traffic based on loss of consecutive packets in data-stream and buffer
occupancy level. Buffer is designed to allow the input traffic to be
partitioned into different priority classes and based on the input
traffic behavior it controls the threshold dynamically. This algorithm
allows input packets to enter into buffer if its occupancy level is less
than the threshold value for priority of that packet. The threshold is
dynamically varied in runtime based on packet loss behavior. The
simulation is run for two priority classes of the input traffic –
realtime and non-realtime classes. The simulation results show that
Adaptive Partial Buffer Sharing (ADPBS) has better performance
than Static Partial Buffer Sharing (SPBS) and First In First Out
(FIFO) queue under the same traffic conditions.