SWARM: A Meta-Scheduler to Minimize Job Queuing Times on Computational Grids

Some meta-schedulers query the information system of individual supercomputers in order to submit jobs to the least busy supercomputer on a computational Grid. However, this information can become outdated by the time a job starts due to changes in scheduling priorities. The MSR scheme is based on Multiple Simultaneous Requests and can take advantage of opportunities resulting from these priorities changes. This paper presents the SWARM meta-scheduler, which can speed up the execution of large sets of tasks by minimizing the job queuing time through the submission of multiple requests. Performance tests have shown that this new meta-scheduler is faster than an implementation of the MSR scheme and the gLite meta-scheduler. SWARM has been used through the GridQTL project beta-testing portal during the past year. Statistics are provided for this usage and demonstrate its capacity to achieve reliably a substantial reduction of the execution time in production conditions.





References:
[1] I. Foster and C. Kesselman, "The Grid: Blueprint for a New Computing
Infrastructure," 2nd ed., Ed. Los Altos: Morgan-Kaufman, 2004.
[2] I. Foster, "Globus toolkit version 4: software for service-oriented
systems," in Proc. Conf. on Network and Parallel Computing, Beijing,
China, Nov.-Dec. 2005, pp. 2-13.
[3] J. Novotny, M. Russel and O. Wehren, "GridSphere: a portal framework
for building collaborations," Concurrency and Computation: Practice
and Experience, vol. 16, no. 5, pp. 503-513, Mar. 2004.
[4] G. Seaton, J. Hernández-Sánchez, J.-A. Grunchec, I. White, J. Allen, D.-
J. De Koning, W. Wei, D. Berry, C. Haley and S. Knott, "GridQTL: A
Grid Portal for QTL Mapping of Compute Intensive Datasets," in Proc.
8th World Congress on Genetics Applied to Livestock Production, Belo
Horizonte, Brazil, Aug. 2006.
[5] M. Lynch and J. Walsh, "Genetics and Analysis of Quantitative Traits,"
Sunderland, MA: Sinauer Associates, 1998.
[6] T. Meuwissen, A. Karlsen, S. Lien, I. Olsaker and M. Goddard, "Fine
mapping of a quantitative trait locus for twinning rate using combined
linkage and linkage disequilibrium mapping," J. Genetics, vol. 161, no.
1, pp. 373-379, May 2002.
[7] The UK National Grid Service (Online). Available: http://www.gridsupport.
ac.uk
[8] The Edinburgh Compute and Data Facility (Online). Available:
http://www.ecdf.ed.ac.uk/index.shtml
[9] The Condor Project (Online). http://www.cs.wisc.edu/condor
[10] V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan,
"Distributed job scheduling on computational grids using multiple
simultaneous requests," in Proc. 11th IEEE Int. Symposium on High
Performance Distributed Computing, Edinburgh, UK, Jul. 2002, pp.
359-368.
[11] The Java Servlet Technology (Online). Available:
http://java.sun.com/products/servlet/index.jsp
[12] The NGS gLite Resource Broker tutorial (Online). Available:
http://wiki.ngs.ac.uk/index.php?title=Resource_Broker_Tutorial
[13] E. Laure, E. Fisher, S. Fisher, A. Frohner, C. Grandi and P. Kunszt,
"Programming the Grid with gLite," Computational Methods in Science
and Technology, vol. 12, no. 1, pp. 33-45, 2006.
[14] G. Gagliardi, "The EGEE European Grid infrastructure project," in Proc.
6th Int. Conf. High Performance Computing for Computational Science,
Valencia, Spain, Jun. 2004, pp. 194-203.
[15] Job sumission into the LHC Grid (Job Management + JDL ) (Online).
Available:
http://www.egee.hu/grid06/download/day_1/05_EGEE_job_execution_a
nd_JDL.ppt
[16] The LDLA beta testing portal (Online). Available:
http://cleopatra.cap.ed.ac.uk/gridsphere/gridsphere
[17] Apache Tomcat (Online). Available: http://tomcat.apache.org
[18] J. Garret, "Ajax: A new approach to web applications", Adaptive path,
2005 (Online). Available:
http://www.adaptivepath.com/publications/essays/archives/000385.php
[19] R. Buyya, D. Abramson and J. Giddy, "Nimrod/G: An architecture for a
resource management and scheduling system in a global computational
Grid," in Proc. 4th Int. Conf. on High Performance Computing in Asia-
Pacific Region, Beijing, China, May 2000, pp. 283-289.
[20] F. Casanova, G. Obertelli, F. Berman and R. Wolski, "The AppLeS
parameter sweep template: user-level middleware for the Grid," in Proc.
Super Computing 2000, Dallas, Texas, Nov. 2000.
[21] D. Abramson, J. Giddy and L. Kotler, "High performance parametric
modeling with Nimrod/G: Killer application for the global Grid?," in
Proc. 14th Int. Parallel and Distributed Processing Symposium, Cancun,
Mexico, May 2000, pp. 520-528.
[22] S. Venugopal, R. Buyya and L. Winton, "A grid service broker for
scheduling distributed data-oriented applications on global Grids," in Proc. 2nd Int. Workshop on Middleware for Grid computing, Toronto,
Canada, Oct. 2004, pp. 75-80.
[23] D. Abramson, R. Buyya and J. Gidd, "A computational economy for grid
computing and its implementation in the Nimrod-G resource broker,"
Future Generation Computer Systems, vol. 18, no. 8, pp. 1061-1074,
Oct.2002.
[24] B. Beeson, S. Melnikoff, S. Venugopal and D. Barnes, "A portal for
grid-enabled physics," in Proc. 2005 Australasian workshop on Grid
computing and e-research - volume 44, Newcastle, Australia, Jan.-Feb.
2005, pp. 13-20.
[25] T. Suzumara, H. Nakada, S. Matsuoka and H. Casanova, "GridSpeed: a
Web-based Grid portal generation server," in Proc. 7th Int. Conf. on High
Performance Computing and Grid in Asia Pacific Region, Tokyo, Japan,
Jul. 2004, pp. 26-33.
[26] J. Frey, T. Tannenbaum, I. Foster and S. Tuecke, "Condor-G: a
computation management agent for multi-institutional grids," Cluster
Computing, vol. 5, no. 3, pp. 237-246, 2004, Jul. 2002.
[27] The GridQTL portal (Online). Available: http://www.gridqtl.org.uk