Very Large Scale Integration Architecture of Finite Impulse Response Filter Implementation Using Retiming Technique

Recursive combination of an algorithm based on Karatsuba multiplication is exploited to design a generalized transpose and parallel Finite Impulse Response (FIR) Filter. Mid-range Karatsuba multiplication and Carry Save adder based on Karatsuba multiplication reduce time complexity for higher order multiplication implemented up to n-bit. As a result, we design modified N-tap Transpose and Parallel Symmetric FIR Filter Structure using Karatsuba algorithm. The mathematical formulation of the FFA Filter is derived. The proposed architecture involves significantly less area delay product (APD) then the existing block implementation. By adopting retiming technique, hardware cost is reduced further. The filter architecture is designed by using 90 nm technology library and is implemented by using cadence EDA Tool. The synthesized result shows better performance for different word length and block size. The design achieves switching activity reduction and low power consumption by applying with and without retiming for different combination of the circuit. The proposed structure achieves more than a half of the power reduction by adopting with and without retiming techniques compared to the earlier design structure. As a proof of the concept for block size 16 and filter length 64 for CKA method, it achieves a 51% as well as 70% less power by applying retiming technique, and for CSA method it achieves a 57% as well as 77% less power by applying retiming technique compared to the previously proposed design.

A New Efficient RNS Reverse Converter for the 4-Moduli Set 

In this paper, we propose a new efficient reverse converter for the 4-moduli set {2n, 2n + 1, 2n − 1, 22n+1 – 1} based on a modified Chinese Remainder Theorem and Mixed Radix Conversion. Additionally, the resulting architecture is further reduced to obtain a reverse converter that utilizes only carry save adders, a multiplexer and carry propagate adders. The proposed converter has an area cost of (12n + 2) FAs and (5n + 1) HAs with a delay of (9n + 6)tFA + tMUX. When compared with state of the art, our proposal demonstrates to be faster, at the expense of slightly more hardware resources. Further, the Area-Time square metric was computed which indicated that our proposed scheme outperforms the state of the art reverse converter.

A Robust Redundant Residue Representation in Residue Number System with Moduli Set(rn-2,rn-1,rn)

The residue number system (RNS), due to its properties, is used in applications in which high performance computation is needed. The carry free nature, which makes the arithmetic, carry bounded as well as the paralleling facility is the reason of its capability of high speed rendering. Since carry is not propagated between the moduli in this system, the performance is only restricted by the speed of the operations in each modulus. In this paper a novel method of number representation by use of redundancy is suggested in which {rn- 2,rn-1,rn} is the reference moduli set where r=2k+1 and k =1, 2,3,.. This method achieves fast computations and conversions and makes the circuits of them much simpler.

An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.