Abstract: In global navigation satellite system (GNSS) denied settings, such as indoor environments, autonomous mobile robots are often limited to dead-reckoning navigation techniques to determine their position, velocity, and attitude (PVA). Localization is typically accomplished by employing an inertial measurement unit (IMU), which, while precise in nature, accumulates errors rapidly and severely degrades the localization solution. Standard sensor fusion methods, such as Kalman filtering, aim to fuse precise IMU measurements with accurate aiding sensors to establish a precise and accurate solution. In indoor environments, where GNSS and no other a priori information is known about the environment, effective sensor fusion is difficult to achieve, as accurate aiding sensor choices are sparse. However, an opportunity arises by employing a depth camera in the indoor environment. A depth camera can capture point clouds of the surrounding floors and walls. Extracting attitude from these surfaces can serve as an accurate aiding source, which directly combats errors that arise due to gyroscope imperfections. This configuration for sensor fusion leads to a dramatic reduction of PVA error compared to traditional aiding sensor configurations. This paper provides the theoretical basis for the depth camera aiding sensor method, initial expectations of performance benefit via simulation, and hardware implementation thus verifying its veracity. Hardware implementation is performed on the Quanser Qbot 2™ mobile robot, with a Vector-Nav VN-200™ IMU and Kinect™ camera from Microsoft.
Abstract: The Versatile Video Coding standard (VVC) is actually under development by the Joint Video Exploration Team (or JVET). An Adaptive Multiple Transforms (AMT) approach was announced. It is based on different transform modules that provided an efficient coding. However, the AMT solution raises several issues especially regarding the complexity of the selected set of transforms. This can be an important issue, particularly for a future industrial adoption. This paper proposed an efficient hardware implementation of the most used transform in AMT approach: the DCT II. The developed circuit is adapted to different block sizes and can reach a minimum frequency of 192 MHz allowing an optimized execution time.
Abstract: Nowadays, demand for using real-time video transmission capable devices is ever-increasing. So, high resolution videos have made efficient video compression techniques an essential component for capturing and transmitting video data. Motion estimation has a critical role in encoding raw video. Hence, various motion estimation methods are introduced to efficiently compress the video. Low bit‑depth representation based motion estimation methods facilitate computation of matching criteria and thus, provide small hardware footprint. In this paper, a hardware implementation of a two-bit transformation based low-complexity motion estimation method using local binary pattern approach is proposed. Image frames are represented in two-bit depth instead of full-depth by making use of the local binary pattern as a binarization approach and the binarization part of the hardware architecture is explained in detail. Experimental results demonstrate the difference between the proposed hardware architecture and the architectures of well-known low-complexity motion estimation methods in terms of important aspects such as resource utilization, energy and power consumption.
Abstract: In this paper, a digital control algorithm based on delta-operator is presented for high-frequency digitally-controlled DC-DC switching converters. The stability and the controlling accuracy of the DC-DC switching converters are improved by using the digital control algorithm based on delta-operator without increasing the hardware circuit scale. The design method of voltage compensator in delta-domain using PID (Proportion-Integration- Differentiation) control is given in this paper, and the simulation results based on Simulink platform are provided, which have verified the theoretical analysis results very well. It can be concluded that, the presented control algorithm based on delta-operator has better stability and controlling accuracy, and easier hardware implementation than the existed control algorithms based on z-operator, therefore it can be used for the voltage compensator design in high-frequency digitally- controlled DC-DC switching converters.
Abstract: In this work we make a bifurcation analysis for a
single compartment representation of Traub model, one of the most
important conductance-based models. The analysis focus in two
principal parameters: current and leakage conductance. Study of
stable and unstable solutions are explored; also Hop-bifurcation and
frequency interpretation when current varies is examined. This study
allows having control of neuron dynamics and neuron response when
these parameters change. Analysis like this is particularly important
for several applications such as: tuning parameters in learning
process, neuron excitability tests, measure bursting properties of the
neuron, etc. Finally, a hardware implementation results were
developed to corroborate these results.
Abstract: Shifted polynomial basis (SPB) is a variation of
polynomial basis representation. SPB has potential for efficient
bit level and digi -level implementations of multiplication over
binary extension fields with subquadratic space complexity. For
efficient implementation of pairing computation with large finite
fields, this paper presents a new SPB multiplication algorithm based
on Karatsuba schemes, and used that to derive a novel scalable
multiplier architecture. Analytical results show that the proposed
multiplier provides a trade-off between space and time complexities.
Our proposed multiplier is modular, regular, and suitable for very
large scale integration (VLSI) implementations. It involves less
area complexity compared to the multipliers based on traditional
decomposition methods. It is therefore, more suitable for efficient
hardware implementation of pairing based cryptography and elliptic
curve cryptography (ECC) in constraint driven applications.
Abstract: This paper presents the design, implementation and evaluation of a micro-network, or Network-on-Chip (NoC), based on a generic pipeline router architecture. The router is designed to efficiently support traffic generated by multimedia applications on embedded multi-core systems. It employs a simplest routing mechanism and implements the round-robin scheduling strategy to resolve output port contentions and minimize latency. A virtual channel flow control is applied to avoid the head-of-line blocking problem and enhance performance in the NoC. The hardware design of the router architecture has been implemented at the register transfer level; its functionality is evaluated in the case of the two dimensional Mesh/Torus topology, and performance results are derived from ModelSim simulator and Xilinx ISE 9.2i synthesis tool. An example of a multi-core image processing system utilizing the NoC structure has been implemented and validated to demonstrate the capability of the proposed micro-network architecture. To reduce complexity of the image compression and decompression architecture, the system use image processing algorithm based on classical discrete cosine transform with an efficient zonal processing approach. The experimental results have confirmed that both the proposed image compression scheme and NoC architecture can achieve a reasonable image quality with lower processing time.
Abstract: .Hardware realization of a Neural Network (NN), to a large extent depends on the efficient implementation of a single neuron. FPGA-based reconfigurable computing architectures are suitable for hardware implementation of neural networks. FPGA realization of ANNs with a large number of neurons is still a challenging task. This paper discusses the issues involved in implementation of a multi-input neuron with linear/nonlinear excitation functions using FPGA. Implementation method with resource/speed tradeoff is proposed to handle signed decimal numbers. The VHDL coding developed is tested using Xilinx XC V50hq240 Chip. To improve the speed of operation a lookup table method is used. The problems involved in using a lookup table (LUT) for a nonlinear function is discussed. The percentage saving in resource and the improvement in speed with an LUT for a neuron is reported. An attempt is also made to derive a generalized formula for a multi-input neuron that facilitates to estimate approximately the total resource requirement and speed achievable for a given multilayer neural network. This facilitates the designer to choose the FPGA capacity for a given application. Using the proposed method of implementation a neural network based application, namely, a Space vector modulator for a vector-controlled drive is presented
Abstract: Several studies have been carried out, using various techniques, including neural networks, to discriminate vigilance states in humans from electroencephalographic (EEG) signals, but we are still far from results satisfactorily useable results. The work presented in this paper aims at improving this status with regards to 2 aspects. Firstly, we introduce an original procedure made of the association of two neural networks, a self organizing map (SOM) and a learning vector quantization (LVQ), that allows to automatically detect artefacted states and to separate the different levels of vigilance which is a major breakthrough in the field of vigilance. Lastly and more importantly, our study has been oriented toward real-worked situation and the resulting model can be easily implemented as a wearable device. It benefits from restricted computational and memory requirements and data access is very limited in time. Furthermore, some ongoing works demonstrate that this work should shortly results in the design and conception of a non invasive electronic wearable device.
Abstract: Rounding of coefficients is a common practice in
hardware implementation of digital filters. Where some coefficients
are very close to zero or one, as assumed in this paper, this rounding
action also leads to some computation reduction. Furthermore, if the
discarded coefficient is of high order, a reduced order filter is
obtained, otherwise the order does not change but computation is
reduced. In this paper, the Least Squares approximation to rounded
(or discarded) coefficient FIR filter is investigated. The result also
succinctly extended to general type of FIR filters.
Abstract: In this paper an efficient implementation of Ripemd-
160 hash function is presented. Hash functions are a special family
of cryptographic algorithms, which is used in technological
applications with requirements for security, confidentiality and
validity. Applications like PKI, IPSec, DSA, MAC-s incorporate
hash functions and are used widely today. The Ripemd-160 is
emanated from the necessity for existence of very strong algorithms
in cryptanalysis. The proposed hardware implementation can be
synthesized easily for a variety of FPGA and ASIC technologies.
Simulation results, using commercial tools, verified the efficiency of
the implementation in terms of performance and throughput. Special
care has been taken so that the proposed implementation doesn-t
introduce extra design complexity; while in parallel functionality was
kept to the required levels.
Abstract: The “PYRAMIDS" Block Cipher is a symmetric encryption algorithm of a 64, 128, 256-bit length, that accepts a variable key length of 128, 192, 256 bits. The algorithm is an iterated cipher consisting of repeated applications of a simple round transformation with different operations and different sequence in each round. The algorithm was previously software implemented in Cµ code. In this paper, a hardware implementation of the algorithm, using Field Programmable Gate Arrays (FPGA), is presented. In this work, we discuss the algorithm, the implemented micro-architecture, and the simulation and implementation results. Moreover, we present a detailed comparison with other implemented standard algorithms. In addition, we include the floor plan as well as the circuit diagrams of the various micro-architecture modules.
Abstract: Full search block matching algorithm is widely used for hardware implementation of motion estimators in video compression algorithms. In this paper we are proposing a new architecture, which consists of a 2D parallel processing unit and a 1D unit both working in parallel. The proposed architecture reduces both data access power and computational power which are the main causes of power consumption in integer motion estimation. It also completes the operations with nearly the same number of clock cycles as compared to a 2D systolic array architecture. In this work sum of absolute difference (SAD)-the most repeated operation in block matching, is calculated in two steps. The first step is to calculate the SAD for alternate rows by a 2D parallel unit. If the SAD calculated by the parallel unit is less than the stored minimum SAD, the SAD of the remaining rows is calculated by the 1D unit. Early termination, which stops avoidable computations has been achieved with the help of alternate rows method proposed in this paper and by finding a low initial SAD value based on motion vector prediction. Data reuse has been applied to the reference blocks in the same search area which significantly reduced the memory access.
Abstract: In this paper the FPGA implementations for four
stream ciphers are presented. The two stream ciphers, MUGI and
SNOW 2.0 are recently adopted by the International Organization for
Standardization ISO/IEC 18033-4:2005 standard. The other two
stream ciphers, MICKEY 128 and TRIVIUM have been submitted
and are under consideration for the eSTREAM, the ECRYPT
(European Network of Excellence for Cryptology) Stream Cipher
project. All ciphers were coded using VHDL language. For the
hardware implementation, an FPGA device was used. The proposed
implementations achieve throughputs range from 166 Mbps for
MICKEY 128 to 6080 Mbps for MUGI.
Abstract: Optical flow is a research topic of interest for many
years. It has, until recently, been largely inapplicable to real-time
applications due to its computationally expensive nature. This paper
presents a new reliable flow technique which is combined with a
motion detection algorithm, from stationary camera image streams,
to allow flow-based analyses of moving entities, such as rigidity, in
real-time. The combination of the optical flow analysis with motion
detection technique greatly reduces the expensive computation of
flow vectors as compared with standard approaches, rendering the
method to be applicable in real-time implementation. This paper
describes also the hardware implementation of a proposed pipelined
system to estimate the flow vectors from image sequences in real
time. This design can process 768 x 576 images at a very high frame
rate that reaches to 156 fps in a single low cost FPGA chip, which is
adequate for most real-time vision applications.
Abstract: In this paper, we study FPGA implementation of a
novel supra-optimal receiver diversity combining technique,
generalized maximal ratio combining (GMRC), for wireless
transmission over fading channels in SIMO systems. Prior
published results using ML-detected GMRC diversity signal
driven by BPSK showed superior bit error rate performance to
the widely used MRC combining scheme in an imperfect
channel estimation (ICE) environment. Under perfect channel
estimation conditions, the performance of GMRC and MRC
were identical. The main drawback of the GMRC study was
that it was theoretical, thus successful FPGA implementation
of it using pipeline techniques is needed as a wireless
communication test-bed for practical real-life situations.
Simulation results showed that the hardware implementation
was efficient both in terms of speed and area. Since diversity
combining is especially effective in small femto- and picocells,
internet-associated wireless peripheral systems are to
benefit most from GMRC. As a result, many spinoff
applications can be made to the hardware of IP-based 4th
generation networks.
Abstract: Conventional approaches in the implementation of logic programming applications on embedded systems are solely of software nature. As a consequence, a compiler is needed that transforms the initial declarative logic program to its equivalent procedural one, to be programmed to the microprocessor. This approach increases the complexity of the final implementation and reduces the overall system's performance. On the contrary, presenting hardware implementations which are only capable of supporting logic programs prevents their use in applications where logic programs need to be intertwined with traditional procedural ones, for a specific application. We exploit HW/SW codesign methods to present a microprocessor, capable of supporting hybrid applications using both programming approaches. We take advantage of the close relationship between attribute grammar (AG) evaluation and knowledge engineering methods to present a programmable hardware parser that performs logic derivations and combine it with an extension of a conventional RISC microprocessor that performs the unification process to report the success or failure of those derivations. The extended RISC microprocessor is still capable of executing conventional procedural programs, thus hybrid applications can be implemented. The presented implementation is programmable, supports the execution of hybrid applications, increases the performance of logic derivations (experimental analysis yields an approximate 1000% increase in performance) and reduces the complexity of the final implemented code. The proposed hardware design is supported by a proposed extended C-language called C-AG.
Abstract: In this paper, we evaluate the choice of suitable
quantization characteristics for both the decoder messages and the
received samples in Low Density Parity Check (LDPC) coded
systems using M-QAM (Quadrature Amplitude Modulation)
schemes. The analysis involves the demapper block that provides
initial likelihood values for the decoder, by relating its quantization
strategy of the decoder. A mapping strategy refers to the grouping of
bits within a codeword, where each m-bit group is used to select a
2m-ary signal in accordance with the signal labels. Further we
evaluate the system with mapping strategies like Consecutive-Bit
(CB) and Bit-Reliability (BR). A new demapper version, based on
approximate expressions, is also presented to yield a low complexity
hardware implementation.
Abstract: This paper presents a novel genetic algorithm, termed
the Optimum Individual Monogenetic Algorithm (OIMGA) and
describes its hardware implementation. As the monogenetic strategy
retains only the optimum individual, the memory requirement is
dramatically reduced and no crossover circuitry is needed, thereby
ensuring the requisite silicon area is kept to a minimum.
Consequently, depending on application requirements, OIMGA
allows the investigation of solutions that warrant either larger GA
populations or individuals of greater length. The results given in this
paper demonstrate that both the performance of OIMGA and its
convergence time are superior to those of existing hardware GA
implementations. Local convergence is achieved in OIMGA by
retaining elite individuals, while population diversity is ensured by
continually searching for the best individuals in fresh regions of the
search space.
Abstract: Higher-order Statistics (HOS), also known as
cumulants, cross moments and their frequency domain counterparts,
known as poly spectra have emerged as a powerful signal processing
tool for the synthesis and analysis of signals and systems. Algorithms
used for the computation of cross moments are computationally
intensive and require high computational speed for real-time
applications. For efficiency and high speed, it is often advantageous
to realize computation intensive algorithms in hardware. A promising
solution that combines high flexibility together with the speed of a
traditional hardware is Field Programmable Gate Array (FPGA). In
this paper, we present FPGA-based parallel architecture for the
computation of third-order cross moments. The proposed design is
coded in Very High Speed Integrated Circuit (VHSIC) Hardware
Description Language (VHDL) and functionally verified by
implementing it on Xilinx Spartan-3 XC3S2000FG900-4 FPGA.
Implementation results are presented and it shows that the proposed
design can operate at a maximum frequency of 86.618 MHz.