An Efficient Architecture for Interleaved Modular Multiplication
Modular multiplication is the basic operation
in most public key cryptosystems, such as RSA, DSA, ECC,
and DH key exchange. Unfortunately, very large operands
(in order of 1024 or 2048 bits) must be used to provide
sufficient security strength. The use of such big numbers
dramatically slows down the whole cipher system, especially
when running on embedded processors.
So far, customized hardware accelerators - developed on
FPGAs or ASICs - were the best choice for accelerating
modular multiplication in embedded environments. On the
other hand, many algorithms have been developed to speed
up such operations. Examples are the Montgomery modular
multiplication and the interleaved modular multiplication
algorithms. Combining both customized hardware with
an efficient algorithm is expected to provide a much faster
cipher system.
This paper introduces an enhanced architecture for computing
the modular multiplication of two large numbers X
and Y modulo a given modulus M. The proposed design is
compared with three previous architectures depending on
carry save adders and look up tables. Look up tables should
be loaded with a set of pre-computed values. Our proposed
architecture uses the same carry save addition, but replaces
both look up tables and pre-computations with an enhanced
version of sign detection techniques. The proposed architecture
supports higher frequencies than other architectures.
It also has a better overall absolute time for a single operation.
[1] Peter L. Montgomery, "Modular multiplication without trial division," in Mathematics of Computation. April 1985, vol. 44, pp. 519-521, American Mathematical Society.
[2] G. R. Blakley, "A computer algorithm for the product AB modulo M," IEEE Transactions on Computers, pp. 497 - 500, May 1983.
[3] D. Narh Amanor, C. Paar, J. Pelzl, V. Bunimov, and M. Schimmler, "Efficient hardware architectures for modular multiplication
on FPGAs," International Conference on Field Programmable
Logic and Applications, pp. 539-542, 2005.
[4] V. Bunimov and M. Schimmler, "Area and time efficient modular
multiplication of large integers," in IEEE 14th International
Conference on Application specific Systems, Architectures and
Processors, June 2003.
[5] Q.K. Kop and C.Y. Hung, "Fast algorithm for modular reduction,"
in IEE Proceedings, Computers and Digital Techniques,
July 1998, vol. 145, pp. 265-271.
[6] David Narh Amanor, "Efficient hardware architectures for modular
multiplication," M.S. thesis, University of Applied Sciences
Offenburg, Germany, February 2005.
[1] Peter L. Montgomery, "Modular multiplication without trial division," in Mathematics of Computation. April 1985, vol. 44, pp. 519-521, American Mathematical Society.
[2] G. R. Blakley, "A computer algorithm for the product AB modulo M," IEEE Transactions on Computers, pp. 497 - 500, May 1983.
[3] D. Narh Amanor, C. Paar, J. Pelzl, V. Bunimov, and M. Schimmler, "Efficient hardware architectures for modular multiplication
on FPGAs," International Conference on Field Programmable
Logic and Applications, pp. 539-542, 2005.
[4] V. Bunimov and M. Schimmler, "Area and time efficient modular
multiplication of large integers," in IEEE 14th International
Conference on Application specific Systems, Architectures and
Processors, June 2003.
[5] Q.K. Kop and C.Y. Hung, "Fast algorithm for modular reduction,"
in IEE Proceedings, Computers and Digital Techniques,
July 1998, vol. 145, pp. 265-271.
[6] David Narh Amanor, "Efficient hardware architectures for modular
multiplication," M.S. thesis, University of Applied Sciences
Offenburg, Germany, February 2005.
@article{"International Journal of Information, Control and Computer Sciences:51597", author = "Ahmad M. Abdel Fattah and Ayman M. Bahaa El-Din and Hossam M.A. Fahmy", title = "An Efficient Architecture for Interleaved Modular Multiplication", abstract = "Modular multiplication is the basic operation
in most public key cryptosystems, such as RSA, DSA, ECC,
and DH key exchange. Unfortunately, very large operands
(in order of 1024 or 2048 bits) must be used to provide
sufficient security strength. The use of such big numbers
dramatically slows down the whole cipher system, especially
when running on embedded processors.
So far, customized hardware accelerators - developed on
FPGAs or ASICs - were the best choice for accelerating
modular multiplication in embedded environments. On the
other hand, many algorithms have been developed to speed
up such operations. Examples are the Montgomery modular
multiplication and the interleaved modular multiplication
algorithms. Combining both customized hardware with
an efficient algorithm is expected to provide a much faster
cipher system.
This paper introduces an enhanced architecture for computing
the modular multiplication of two large numbers X
and Y modulo a given modulus M. The proposed design is
compared with three previous architectures depending on
carry save adders and look up tables. Look up tables should
be loaded with a set of pre-computed values. Our proposed
architecture uses the same carry save addition, but replaces
both look up tables and pre-computations with an enhanced
version of sign detection techniques. The proposed architecture
supports higher frequencies than other architectures.
It also has a better overall absolute time for a single operation.", keywords = "Montgomery multiplication, modular multiplication, efficient architecture, FPGA, RSA", volume = "3", number = "8", pages = "1928-5", }