An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.




References:
[1] Peter L. Montgomery, "Modular multiplication without trial division," in Mathematics of Computation. April 1985, vol. 44, pp. 519-521, American Mathematical Society.
[2] G. R. Blakley, "A computer algorithm for the product AB modulo M," IEEE Transactions on Computers, pp. 497 - 500, May 1983.
[3] D. Narh Amanor, C. Paar, J. Pelzl, V. Bunimov, and M. Schimmler, "Efficient hardware architectures for modular multiplication
on FPGAs," International Conference on Field Programmable
Logic and Applications, pp. 539-542, 2005.
[4] V. Bunimov and M. Schimmler, "Area and time efficient modular
multiplication of large integers," in IEEE 14th International
Conference on Application specific Systems, Architectures and
Processors, June 2003.
[5] Q.K. Kop and C.Y. Hung, "Fast algorithm for modular reduction,"
in IEE Proceedings, Computers and Digital Techniques,
July 1998, vol. 145, pp. 265-271.
[6] David Narh Amanor, "Efficient hardware architectures for modular
multiplication," M.S. thesis, University of Applied Sciences
Offenburg, Germany, February 2005.