An Efficient Architecture for Dynamic Customization and Provisioning of Virtual Appliance in Cloud Environment

Cloud computing is a business model which provides an easier management of computing resources. Cloud users can request virtual machine and install additional softwares and configure them if needed. However, user can also request virtual appliance which provides a better solution to deploy application in much faster time, as it is ready-built image of operating system with necessary softwares installed and configured. Large numbers of virtual appliances are available in different image format. User can download available appliances from public marketplace and start using it. However, information published about the virtual appliance differs from each providers leading to the difficulty in choosing required virtual appliance as it is composed of specific OS with standard software version. However, even if user choses the appliance from respective providers, user doesn’t have any flexibility to choose their own set of softwares with required OS and application. In this paper, we propose a referenced architecture for dynamically customizing virtual appliance and provision them in an easier manner. We also add our experience in integrating our proposed architecture with public marketplace and Mi-Cloud, a cloud management software.

FPGA Implementation of RSA Cryptosystem

In this paper, the hardware implementation of the RSA public-key cryptographic algorithm is presented. The RSA cryptographic algorithm is depends on the computation of repeated modular exponentials. The Montgomery algorithm is used and modified to reduce hardware resources and to achieve reasonable operating speed for FPGA. An efficient architecture for modular multiplications based on the array multiplier is proposed. We have implemented a RSA cryptosystem based on Montgomery algorithm. As a result, it is shown that proposed architecture contributes to small area and reasonable speed.

Hardware Implementation of Stack-Based Replacement Algorithms

Block replacement algorithms to increase hit ratio have been extensively used in cache memory management. Among basic replacement schemes, LRU and FIFO have been shown to be effective replacement algorithms in terms of hit rates. In this paper, we introduce a flexible stack-based circuit which can be employed in hardware implementation of both LRU and FIFO policies. We propose a simple and efficient architecture such that stack-based replacement algorithms can be implemented without the drawbacks of the traditional architectures. The stack is modular and hence, a set of stack rows can be cascaded depending on the number of blocks in each cache set. Our circuit can be implemented in conjunction with the cache controller and static/dynamic memories to form a cache system. Experimental results exhibit that our proposed circuit provides an average value of 26% improvement in storage bits and its maximum operating frequency is increased by a factor of two

An Efficient Architecture for Interleaved Modular Multiplication

Modular multiplication is the basic operation in most public key cryptosystems, such as RSA, DSA, ECC, and DH key exchange. Unfortunately, very large operands (in order of 1024 or 2048 bits) must be used to provide sufficient security strength. The use of such big numbers dramatically slows down the whole cipher system, especially when running on embedded processors. So far, customized hardware accelerators - developed on FPGAs or ASICs - were the best choice for accelerating modular multiplication in embedded environments. On the other hand, many algorithms have been developed to speed up such operations. Examples are the Montgomery modular multiplication and the interleaved modular multiplication algorithms. Combining both customized hardware with an efficient algorithm is expected to provide a much faster cipher system. This paper introduces an enhanced architecture for computing the modular multiplication of two large numbers X and Y modulo a given modulus M. The proposed design is compared with three previous architectures depending on carry save adders and look up tables. Look up tables should be loaded with a set of pre-computed values. Our proposed architecture uses the same carry save addition, but replaces both look up tables and pre-computations with an enhanced version of sign detection techniques. The proposed architecture supports higher frequencies than other architectures. It also has a better overall absolute time for a single operation.

Efficient Hardware Implementation of an Elliptic Curve Cryptographic Processor Over GF (2 163)

A new and highly efficient architecture for elliptic curve scalar point multiplication which is optimized for a binary field recommended by NIST and is well-suited for elliptic curve cryptographic (ECC) applications is presented. To achieve the maximum architectural and timing improvements we have reorganized and reordered the critical path of the Lopez-Dahab scalar point multiplication architecture such that logic structures are implemented in parallel and operations in the critical path are diverted to noncritical paths. With G=41, the proposed design is capable of performing a field multiplication over the extension field with degree 163 in 11.92 s with the maximum achievable frequency of 251 MHz on Xilinx Virtex-4 (XC4VLX200) while 22% of the chip area is occupied, where G is the digit size of the underlying digit-serial finite field multiplier.

Low Jitter ADPLL based Clock Generator for High Speed SoC Applications

An efficient architecture for low jitter All Digital Phase Locked Loop (ADPLL) suitable for high speed SoC applications is presented in this paper. The ADPLL is designed using standard cells and described by Hardware Description Language (HDL). The ADPLL implemented in a 90 nm CMOS process can operate from 10 to 200 MHz and achieve worst case frequency acquisition in 14 reference clock cycles. The simulation result shows that PLL has cycle to cycle jitter of 164 ps and period jitter of 100 ps at 100MHz. Since the digitally controlled oscillator (DCO) can achieve both high resolution and wide frequency range, it can meet the demands of system-level integration. The proposed ADPLL can easily be ported to different processes in a short time. Thus, it can reduce the design time and design complexity of the ADPLL, making it very suitable for System-on-Chip (SoC) applications.