Abstract: Since 1992, year where Hugo de Garis has published
the first paper on Evolvable Hardware (EHW), a period of intense
creativity has followed. It has been actively researched, developed
and applied to various problems. Different approaches have been
proposed that created three main classifications: extrinsic, mixtrinsic
and intrinsic EHW. Each of these solutions has a real interest.
Nevertheless, although the extrinsic evolution generates some
excellent results, the intrinsic systems are not so advanced. This
paper suggests 3 possible solutions to implement the run-time
configuration intrinsic EHW system: FPGA-based Run-Time
Configuration system, JBits-based Run-Time Configuration system
and Multi-board functional-level Run-Time Configuration system.
The main characteristic of the proposed architectures is that they are
implemented on Field Programmable Gate Array. A comparison of
proposed solutions demonstrates that multi-board functional-level
run-time configuration is superior in terms of scalability, flexibility
and the implementation easiness.
Abstract: Truncated multiplier is a good candidate for digital
signal processing (DSP) applications including finite impulse
response (FIR) and discrete cosine transform (DCT). Through
truncated multiplier a significant reduction in Field Programmable
Gate Array (FPGA) resources can be achieved. This paper presents
for the first time a comparison of resource utilization of Spartan-3AN
and Virtex-5 implementation of standard and truncated multipliers
using Very High Speed Integrated Circuit Hardware Description
Language (VHDL). The Virtex-5 FPGA shows significant
improvement as compared to Spartan-3AN FPGA device. The
Virtex-5 FPGA device shows better performance with a percentage
ratio of number of occupied slices for standard to truncated
multipliers is increased from 40% to 73.86% as compared to Spartan-
3AN is decreased from 68.75% to 58.78%. Results show that the
anomaly in Spartan-3AN FPGA device average connection and
maximum pin delay have been efficiently reduced in Virtex-5 FPGA
device.
Abstract: On-board Error Detection and Correction (EDAC)
devices aim to secure data transmitted between the central
processing unit (CPU) of a satellite onboard computer and its local
memory. This paper presents a comparison of the performance of
four low complexity EDAC techniques for application in Random
Access Memories (RAMs) on-board small satellites. The
performance of a newly proposed EDAC architecture is measured
and compared with three different EDAC strategies, using the same
FPGA technology. A statistical analysis of single-event upset (SEU)
and multiple-bit upset (MBU) activity in commercial memories
onboard Alsat-1 is given for a period of 8 years
Abstract: This paper presents preliminary results regarding system-level power awareness for FPGA implementations in wireless sensor networks. Re-configurability of field programmable gate arrays (FPGA) allows for significant flexibility in its applications to embedded systems. However, high power consumption in FPGA becomes a significant factor in design considerations. We present several ideas and their experimental verifications on how to optimize power consumption at high level of designing process while maintaining the same energy per operation (low-level methods can be used additionally). This paper demonstrates that it is possible to estimate feasible power consumption savings even at the high level of designing process. It is envisaged that our results can be also applied to other embedded systems applications, not limited to FPGA-based.
Abstract: In this paper, an improvement of PDLZW implementation
with a new dictionary updating technique is proposed. A
unique dictionary is partitioned into hierarchical variable word-width
dictionaries. This allows us to search through dictionaries in parallel.
Moreover, the barrel shifter is adopted for loading a new input string
into the shift register in order to achieve a faster speed. However,
the original PDLZW uses a simple FIFO update strategy, which is
not efficient. Therefore, a new window based updating technique
is implemented to better classify the difference in how often each
particular address in the window is referred. The freezing policy
is applied to the address most often referred, which would not be
updated until all the other addresses in the window have the same
priority. This guarantees that the more often referred addresses would
not be updated until their time comes. This updating policy leads
to an improvement on the compression efficiency of the proposed
algorithm while still keep the architecture low complexity and easy
to implement.
Abstract: Long number multiplications (n ≥ 128-bit) are a
primitive in most cryptosystems. They can be performed better by
using Karatsuba-Ofman technique. This algorithm is easy to
parallelize on workstation network and on distributed memory, and
it-s known as the practical method of choice. Multiplying long
numbers using Karatsuba-Ofman algorithm is fast but is highly
recursive. In this paper, we propose different designs of
implementing Karatsuba-Ofman multiplier. A mixture of sequential
and combinational system design techniques involving pipelining is
applied to our proposed designs. Multiplying large numbers can be
adapted flexibly to time, area and power criteria. Computationally
and occupation constrained in embedded systems such as: smart
cards, mobile phones..., multiplication of finite field elements can be
achieved more efficiently. The proposed designs are compared to
other existing techniques. Mathematical models (Area (n), Delay (n))
of our proposed designs are also elaborated and evaluated on
different FPGAs devices.
Abstract: Scheduling algorithm is a key technology in satellite
switching system with input-buffer. In this paper, a new scheduling
algorithm and its realization are proposed. Based on Crossbar
switching fabric, the algorithm adopts serial scheduling strategy and
adjusts the output port arbitrating strategy for the better equity of every
port. Consequently, it increases the matching probability. The
algorithm can greatly reduce the scheduling delay and cell loss rate.
The analysis and simulation results by OPNET show that the proposed
algorithm has the better performance than others in average delay and
cell loss rate, and has the equivalent complexity. On the basis of these
results, the hardware realization and simulation based on FPGA are
completed, which validate the feasibility of the new scheduling
algorithm.
Abstract: Each new semiconductor technology node
brings smaller transistors and wires. Although this makes
transistors faster, wires get slower. In nano-scale regime, the
standard copper (Cu) interconnect will become a major hurdle
for FPGA interconnect due to their high resistivity and
electromigration. This paper presents the comprehensive
evaluation of mixed CNT bundle interconnects and
investigates their prospects as energy efficient and high speed
interconnect for future FPGA routing architecture. All
HSPICE simulations are carried out at operating frequency of
1GHz and it is found that mixed CNT bundle implemented in
FPGAs as interconnect can potentially provide a substantial
delay and energy reduction over traditional interconnects at
32nm process technology.
Abstract: Traditional development of wireless sensor network
mote is generally based on SoC1 platform. Such method of
development faces three main drawbacks: lack of flexibility in terms
of development due to low resource and rigid architecture of SoC;
low capability of evolution and portability versus performance if
specific micro-controller architecture features are used; and the rapid
obsolescence of micro-controller comparing to the long lifetime of
power plants or any industrial installations. To overcome these
drawbacks, we have explored a new approach of development of
wireless sensor network mote using a hybrid FPGA technology. The
application of such approach is illustrated through the
implementation of an innovative wireless sensor network protocol
called OCARI.
Abstract: Model Predictive Control (MPC) is increasingly being
proposed for real time applications and embedded systems. However
comparing to PID controller, the implementation of the MPC in
miniaturized devices like Field Programmable Gate Arrays (FPGA)
and microcontrollers has historically been very small scale due to its
complexity in implementation and its computation time requirement.
At the same time, such embedded technologies have become an
enabler for future manufacturing enterprises as well as a transformer
of organizations and markets. Recently, advances in microelectronics
and software allow such technique to be implemented in embedded
systems. In this work, we take advantage of these recent advances
in this area in the deployment of one of the most studied and
applied control technique in the industrial engineering. In fact in
this paper, we propose an efficient framework for implementation
of Generalized Predictive Control (GPC) in the performed STM32
microcontroller. The STM32 keil starter kit based on a JTAG interface
and the STM32 board was used to implement the proposed GPC
firmware. Besides the GPC, the PID anti windup algorithm was
also implemented using Keil development tools designed for ARM
processor-based microcontroller devices and working with C/Cµ
langage. A performances comparison study was done between both
firmwares. This performances study show good execution speed and
low computational burden. These results encourage to develop simple
predictive algorithms to be programmed in industrial standard hardware.
The main features of the proposed framework are illustrated
through two examples and compared with the anti windup PID
controller.
Abstract: Higher-order Statistics (HOS), also known as
cumulants, cross moments and their frequency domain counterparts,
known as poly spectra have emerged as a powerful signal processing
tool for the synthesis and analysis of signals and systems. Algorithms
used for the computation of cross moments are computationally
intensive and require high computational speed for real-time
applications. For efficiency and high speed, it is often advantageous
to realize computation intensive algorithms in hardware. A promising
solution that combines high flexibility together with the speed of a
traditional hardware is Field Programmable Gate Array (FPGA). In
this paper, we present FPGA-based parallel architecture for the
computation of third-order cross moments. The proposed design is
coded in Very High Speed Integrated Circuit (VHSIC) Hardware
Description Language (VHDL) and functionally verified by
implementing it on Xilinx Spartan-3 XC3S2000FG900-4 FPGA.
Implementation results are presented and it shows that the proposed
design can operate at a maximum frequency of 86.618 MHz.
Abstract: This paper introduces a new digital logic design, which
combines the DSP and FPGA to implement the conventional DTC of
induction machine. The DSP will be used for floating point
calculation whereas the FPGA main task is to implement the
hysteresis-based controller. The emphasis is on FPGA digital logic
design. The simulation and experimental results are presented and
summarized.
Abstract: In this paper, the hardware implementation of the
RSA public-key cryptographic algorithm is presented. The RSA
cryptographic algorithm is depends on the computation of repeated
modular exponentials.
The Montgomery algorithm is used and modified to reduce
hardware resources and to achieve reasonable operating speed for
FPGA. An efficient architecture for modular multiplications based on
the array multiplier is proposed. We have implemented a RSA
cryptosystem based on Montgomery algorithm. As a result, it is
shown that proposed architecture contributes to small area and
reasonable speed.
Abstract: Using vision based solution in intelligent vehicle application often needs large memory to handle video stream and image process which increase complexity of hardware and software. In this paper, we present a FPGA implement of a vision based lane departure warning system. By taking frame of videos, the line gradient of line is estimated and the lane marks are found. By analysis the position of lane mark, departure of vehicle will be detected in time. This idea has been implemented in Xilinx Spartan6 FPGA. The lane departure warning system used 39% logic resources and no memory of the device. The average availability is 92.5%. The frame rate is more than 30 frames per second (fps).
Abstract: This paper describes design of a digital feedback loop
for a low switching frequency dc-dc switching converters. Low
switching frequencies were selected in this design. A look up table
for the digital PID (proportional integrator differentiator)
compensator was implemented using Altera Stratix II with built-in
ADC (analog-to-digital converter) to achieve this hardware
realization. Design guidelines are given for the PID compensator,
high frequency DPWM (digital pulse width modulator) and moving
average filter.
Abstract: Memory Errors Detection and Correction aim to secure the transaction of data between the central processing unit of a satellite onboard computer and its local memory. In this paper, the application of a double-bit error detection and correction method is described and implemented in Field Programmable Gate Array (FPGA) technology. The performance of the proposed EDAC method is measured and compared with two different EDAC devices, using the same FPGA technology. Statistical analysis of single-event upset (SEU) and multiple-bit upset (MBU) activity in commercial memories onboard the first Algerian microsatellite Alsat-1 is given.
Abstract: In this paper, RSA encryption algorithm and its hardware
implementation in Xilinx-s Virtex Field Programmable Gate
Arrays (FPGA) is analyzed. The issues of scalability, flexible performance,
and silicon efficiency for the hardware acceleration of
public key crypto systems are being explored in the present work.
Using techniques based on the interleaved math for exponentiation,
the proposed RSA calculation architecture is compared to existing
FPGA-based solutions for speed, FPGA utilization, and scalability.
The paper covers the RSA encryption algorithm, interleaved multiplication,
Miller Rabin algorithm for primality test, extended Euclidean
math, basic FPGA technology, and the implementation details of
the proposed RSA calculation architecture. Performance of several
alternative hardware architectures is discussed and compared. Finally,
conclusion is drawn, highlighting the advantages of a fully flexible
& parameterized design.
Abstract: Space Vector Modulation (SVM) is an optimum Pulse Width Modulation (PWM) technique for an inverter used in a variable frequency drive applications. It is computationally rigorous and hence limits the inverter switching frequency. Increase in switching frequency can be achieved using Neural Network (NN) based SVM, implemented on application specific chips. This paper proposes a neural network based SVM technique for a Voltage Source Inverter (VSI). The network proposed is independent of switching frequency. Different architectures are investigated keeping the total number of neurons constant. The performance of the inverter is compared for various switching frequencies for different architectures of NN based SVM. From the results obtained, the network with minimum resource and appropriate word length is identified. The bit precision required for this application is identified. The network with 8-bit precision is implemented in the IC XCV 400 and the results are presented. The performance of NN based general purpose SVM with higher bit precision is discussed.
Abstract: An effective approach for realizing the binary tree structure, representing a combinational logic functionality with enhanced throughput, is discussed in this paper. The optimization in maximum operating frequency was achieved through delay minimization, which in turn was possible by means of reducing the depth of the binary network. The proposed synthesis methodology has been validated by experimentation with FPGA as the target technology. Though our proposal is technology independent, yet the heuristic enables better optimization in throughput even after technology mapping for such Boolean functionality; whose reduced CNF form is associated with a lesser literal cost than its reduced DNF form at the Boolean equation level. For cases otherwise, our method converges to similar results as that of [12]. The practical results obtained for a variety of case studies demonstrate an improvement in the maximum throughput rate for Spartan IIE (XC2S50E-7FT256) and Spartan 3 (XC3S50-4PQ144) FPGA logic families by 10.49% and 13.68% respectively. With respect to the LUTs and IOBUFs required for physical implementation of the requisite non-regenerative logic functionality, the proposed method enabled savings to the tune of 44.35% and 44.67% respectively, over the existing efficient method available in literature [12].
Abstract: This paper presents a VLSI design approach of a highspeed
and real-time 2-D Discrete Wavelet Transform computing. The
proposed architecture, based on new and fast convolution approach,
reduces the hardware complexity in addition to reduce the critical
path to the multiplier delay. Furthermore, an advanced twodimensional
(2-D) discrete wavelet transform (DWT)
implementation, with an efficient memory area, is designed to
produce one output in every clock cycle. As a result, a very highspeed
is attained. The system is verified, using JPEG2000
coefficients filters, on Xilinx Virtex-II Field Programmable Gate
Array (FPGA) device without accessing any external memory. The
resulting computing rate is up to 270 M samples/s and the (9,7) 2-D
wavelet filter uses only 18 kb of memory (16 kb of first-in-first-out
memory) with 256×256 image size. In this way, the developed design
requests reduced memory and provide very high-speed processing as
well as high PSNR quality.