Abstract: Recursive combination of an algorithm based on Karatsuba multiplication is exploited to design a generalized transpose and parallel Finite Impulse Response (FIR) Filter. Mid-range Karatsuba multiplication and Carry Save adder based on Karatsuba multiplication reduce time complexity for higher order multiplication implemented up to n-bit. As a result, we design modified N-tap Transpose and Parallel Symmetric FIR Filter Structure using Karatsuba algorithm. The mathematical formulation of the FFA Filter is derived. The proposed architecture involves significantly less area delay product (APD) then the existing block implementation. By adopting retiming technique, hardware cost is reduced further. The filter architecture is designed by using 90 nm technology library and is implemented by using cadence EDA Tool. The synthesized result shows better performance for different word length and block size. The design achieves switching activity reduction and low power consumption by applying with and without retiming for different combination of the circuit. The proposed structure achieves more than a half of the power reduction by adopting with and without retiming techniques compared to the earlier design structure. As a proof of the concept for block size 16 and filter length 64 for CKA method, it achieves a 51% as well as 70% less power by applying retiming technique, and for CSA method it achieves a 57% as well as 77% less power by applying retiming technique compared to the previously proposed design.
Abstract: In this paper, we propose a new efficient reverse converter for the 4-moduli set {2n, 2n + 1, 2n − 1, 22n+1 – 1} based on a modified Chinese Remainder Theorem and Mixed Radix Conversion. Additionally, the resulting architecture is further reduced to obtain a reverse converter that utilizes only carry save adders, a multiplexer and carry propagate adders. The proposed converter has an area cost of (12n + 2) FAs and (5n + 1) HAs with a delay of (9n + 6)tFA + tMUX. When compared with state of the art, our proposal demonstrates to be faster, at the expense of slightly more hardware resources. Further, the Area-Time square metric was computed which indicated that our proposed scheme outperforms the state of the art reverse converter.
Abstract: The residue number system (RNS), due to its
properties, is used in applications in which high performance
computation is needed. The carry free nature, which makes the
arithmetic, carry bounded as well as the paralleling facility is the
reason of its capability of high speed rendering. Since carry is not
propagated between the moduli in this system, the performance is
only restricted by the speed of the operations in each modulus. In this
paper a novel method of number representation by use of redundancy
is suggested in which {rn- 2,rn-1,rn} is the reference moduli set
where r=2k+1 and k =1, 2,3,.. This method achieves fast
computations and conversions and makes the circuits of them much
simpler.
Abstract: Modular multiplication is the basic operation
in most public key cryptosystems, such as RSA, DSA, ECC,
and DH key exchange. Unfortunately, very large operands
(in order of 1024 or 2048 bits) must be used to provide
sufficient security strength. The use of such big numbers
dramatically slows down the whole cipher system, especially
when running on embedded processors.
So far, customized hardware accelerators - developed on
FPGAs or ASICs - were the best choice for accelerating
modular multiplication in embedded environments. On the
other hand, many algorithms have been developed to speed
up such operations. Examples are the Montgomery modular
multiplication and the interleaved modular multiplication
algorithms. Combining both customized hardware with
an efficient algorithm is expected to provide a much faster
cipher system.
This paper introduces an enhanced architecture for computing
the modular multiplication of two large numbers X
and Y modulo a given modulus M. The proposed design is
compared with three previous architectures depending on
carry save adders and look up tables. Look up tables should
be loaded with a set of pre-computed values. Our proposed
architecture uses the same carry save addition, but replaces
both look up tables and pre-computations with an enhanced
version of sign detection techniques. The proposed architecture
supports higher frequencies than other architectures.
It also has a better overall absolute time for a single operation.