An Off-the-Shelf Scheme for Dependable Grid Systems Using Virtualization

Recently, grid computing has been widely focused on the science, industry, and business fields, which are required a vast amount of computing. Grid computing is to provide the environment that many nodes (i.e., many computers) are connected with each other through a local/global network and it is available for many users. In the environment, to achieve data processing among nodes for any applications, each node executes mutual authentication by using certificates which published from the Certificate Authority (for short, CA). However, if a failure or fault has occurred in the CA, any new certificates cannot be published from the CA. As a result, a new node cannot participate in the gird environment. In this paper, an off-the-shelf scheme for dependable grid systems using virtualization techniques is proposed and its implementation is verified. The proposed approach using the virtualization techniques is to restart an application, e.g., the CA, if it has failed. The system can tolerate a failure or fault if it has occurred in the CA. Since the proposed scheme is implemented at the application level easily, the cost of its implementation by the system builder hardly takes compared it with other methods. Simulation results show that the CA in the system can recover from its failure or fault.




References:
[1] I. Foster and C. Kesselman, ed., The Grid 2: Blueprint for a new
computing infrastructure, Morgan-Kaufman, 2003.
[2] Ian J. Taylor, From P2P to web services and grids: Peers in a
client/server world, Springer, 2005.
[3] The Globus Alliance, http://www.globus.org/
[4] Steven J. Vaughan-Nichols, "New approach to virtualization is a
lightweight," IEEE Computer, vol.39, no.11, pp.12-14, Nov. 2006.
[5] J.E. Smith and R. Nair, Virtual machines Versatile Platforms for Systems
and Processes, Morgan Kaufmann, 2005.
[6] M. Rosenblum and T. Garfinkel, "Virtual machine monitors: Current
technology and future trends," IEEE Computer, vol.38, no.5, pp.39-47,
May 2005.
[7] S. Hwang and C. Kesselman, "A flexible framework for fault tolerance
in the grid," J. Grid Computing, vol.1, no.3, pp.251-272, 2003.
[8] X. Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, and R.D. Schlichting,
"Fault-tolerant grid services using primary-back-up: feasibility and
performance," Proc. 2004 IEEE Int-l. Conf. Cluster Computing (Cluster
2004), pp.105-114, 2004.
[9] P. Townend and J. Xu, "Dependability in Grids," IEEE Distributed
Systems Online, vol.6, no.12, pp.1-7, Dec. 2005.
[10] Y. Horita, K. Taura, and T. Chikayama, "A scalable and efficient selforganizing
failure detector for grid applications," Proc. 6th IEEE/ACM
Int-l. Workshop Grid Computing (GRID-06), pp.202-210, 2006.
[11] H. Jin, X. Shi, W. Qiang, and D. Zou, "DRIC: Dependable grid
computing framework," IEICE Trans. Inf.& Syst., vol.E89-D, no.2, Feb.
2006.
[12] Y.S. Dai, Y. Pan, and X. Zou, "A hierarchical modeling and analysis for
grid service reliability," IEEE Trans. Computer, vol.56, no.5, pp.681-
691, 2007.
[13] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh, "Terra:
A virtual machine-based platform for trusted computing," Proc. 19th
ACM Symp. Operating Systems Principles (SOSP-03), vol.37, Issue 5,
pp.143-206, 2003.
[14] I. Krsul, A. Ganguly, and J. Zhang, "VMPlants: Providing and managing
virtual machine execution environments for grid computing," Proc. 2004
ACM/IEEE Conf. Supercomputing (SC-04), p.7, 2004.
[15] M. Zhao, J. Zhang, and R. Figueiredo, "Distributed file systems support
for virtual machines in grid computing," Proc. 13th IEEE Int-l. Symp.
High Performance Distributed Computing (HPDC-04), pp.202-211,
2004.
[16] E. Kim, H.S. Kim, H.Y. Yeom, and J. Lee, "GiSK: Making secure,
reliable and scalable VO repository virtualizing generic disks in the
Grid," Proc. 8th Int-l. Conf. High-Performance Computing in Asia-
Pacific Region (HPCASIA-05), pp.370-377, 2005.
[17] P. Suranyi, H. Abe, T. Hirotsu, Y. Shinjo, and K. Kato, "General virtual
hosting via lightweight user-level virtualization," Proc. 2005 Symp.
Applications and the Internet (SAINT-05), pp.229-236, 2005.
[18] V. Buge, Y. Kemp, M. Kunze, and G. Quast, "Application of virtualization
techniques at a university grid center," Proc. 2nd IEEE Int-l. Conf.
e-Science and Grid Computing (e-Science-06), p.155, 2006.
[19] B. Sundaram and B.M. Chapman, "A grid authentication system with
revocation guarantees," Technical Report Number UH-CS-05-24, Computer
Science Department, University of Houston, Jan 25, 2006.
[20] K. Kagawa, K. Yamada, T. Kamiya, and M. Nagata, "Fault tolerant
grid migration using network storage," Proc. Int-l. Conf. Parallel and
Distributed Processing Techniques and Applications (PDPTA-06), vol.1,
pp.241-245, 2006.
[21] S. Tadepalli, C. Ribbens, and S. Varadarajan, "GEMS: A job management
system for fault tolerant grid computing," Proc. Int-l. High-
Performance Computing Symp., pp.59-64, 2004.