Comanche – A Compiler-Driven I/O Management System

Most scientific programs have large input and output data sets that require out-of-core programming or use virtual memory management (VMM). Out-of-core programming is very error-prone and tedious; as a result, it is generally avoided. However, in many instance, VMM is not an effective approach because it often results in substantial performance reduction. In contrast, compiler driven I/O management will allow a program-s data sets to be retrieved in parts, called blocks or tiles. Comanche (COmpiler MANaged caCHE) is a compiler combined with a user level runtime system that can be used to replace standard VMM for out-of-core programs. We describe Comanche and demonstrate on a number of representative problems that it substantially out-performs VMM. Significantly our system does not require any special services from the operating system and does not require modification of the operating system kernel.




References:
[1] E. L. Leiss, Parallel and Vector Computing: A Practical Introduction,
McGraw-Hill, Inc. New York, 1995.
[2] B. Rullman, "Paragon Parallel File System", External Product
Specification, Intel Supercomputer Systems Division, 1993.
[3] P. F. Corbett and D. G. Feitelson, "The Vesta Parallel File System",
ACM Trans. Computer Systems, vol. 14, No. 3, pp. 225-264, 1996.
[4] N. Nieuwejaar and D. Kotz, "The Galley Parallel File System", Parallel
Computing, vol. 23, No. 4-5, pp. 447-476, 1997.
[5] J. del Rosario and A. Choudhary, "High Performance I/O for Parallel
Computers: Problems and Prospects", Computer, March 1994.
[6] D. G. Feitelson, P. F. Corbett, S. J. Baylor, and Y. Hsu, "Parallel I/O
Subsystems in Massively Parallel Supercomputers", IEEE Parallel and
Distributed Technology, pp. 33-47, Fall 1995.
[7] Y. Chen, and M. Winslett, "Automated Tuning of Parallel I/O Systems:
An Approach to Portable I/O Performance for Scientific Applications",
IEEE Transactions on Software Engineering, vol. 26, No. 4, April 2000.
[8] "High Performance Computing and Communications: Grand Challenges
1993 Report". A Report by the Committee of Physics, Mathematical and
Engineering Sciences, Federal Coordinating Council for Science,
Engineering and Technology.
[9] E. L. Leiss and O. G. Johnson: "Advances in High-Performance
Processing of Seismic Data", in Supercomputers in Seismic Exploration,
E. Eisner (ed.), Pergamon Press, Oxford, 1988.
[10] "The Scalable I/O Low-level API: A Portable Programming Interface for
Parallel File Systems", Presentation in Supercomputing-96, Philadelphia,
PA, 1996.
[11] P. F. Corbert, D. Fietelson, S. Fineberg, Y. Hsu, B. Nitsberg, J. Prost, M.
Snir, B. Traversat, and P. Wong, "Overview of the MPI-IO Parallel I/O
Interface", in Proceedings of 3rd Workshop on I/O in Parallel and
Distributed System, IPPS-95, Santa Barbara, CA, April, 1995.
[12] Message Passing Interface Forum, "MPI-2: Extensions to the Message-
Passing Interface", http://www.mpi-forum.org/docs/ docs/html. 1997.
[13] D. Kotz, "Multiprocessor File System Interfaces", in Proceedings of the
2nd International Conference on Parallel and Distributed Information
Systems, pp. 194-201, 1993.
[14] A. D. Brown, T. C. Mowry, and O. Krieger, "Compiler-based I/O
prefetching for out-of-core applications", ACM Transactions on
Computer Systems, vol. 19, Issue 2, pp. 111-170, May 2001.
[15] S. Carr, K. McKenley, and C.-W. Tseng, "Compiler Optimizations for
Improving Data Locality", in Proceedings of 6th International
Conference on Architectural Support for Programming Languages and
Operating Systems (ASPLOS-VI), San Jose, CA, October 1994.
[16] H. Han, G. Rivera, and C.-W. Tseng, "Compiler and Run-time Support
for Improving Locality in Scientific Codes (Extended Abstract)", in
Proceedings of Languages and Compilers for Parallel Computing,
Twelfth International Workshop, Lecture Notes in Computer Science,
Springer-Verlag, 1999.
[17] M. Kandemir, A. Choudhary, J. Ramanujam, and R. Bordawekar,
"Compilation Techniques for Out-of-Core Parallel Computations",
Parallel Computing, vol. 24, No 3-4, pp. 597-628, June 1998.
[18] M. Kandemir, "Compiler-Directed Collective-I/O", IEEE Transactions
on Parallel and distributed Systems, vol. 12, No. 12, December 2001.
[19] M. Paleczny, K. Kennedy, and C. Koelbel, "Compiler Support for Outof-
Core Arrays on Data Parallel Machines", in Proceedings of the 5th
Symposium on the Frontiers of Massively Parallel Computation, pp.
110-118, McLean, VA, February 1995.
[20] E. M. Robinson, D. Davison, and E. L. Leiss, "I/O Minimization in a
Genetic Sequencing Framework", International Conference on Parallel
and Distributed Processing Techniques and Applications (PDPTA'97),
Las Vegas, Nevada, June 1997.
[21] E. Robinson and E. L. Leiss, "Page Utilization in Fortran and C
Programs", 1998 International Conference on Parallel and Distributed
Processing Techniques and Applications (PDPTA'98), Las Vegas,
Nevada, July 1998.
[22] E. M. Robinson and E. L. Leiss, "Compiler Managed Cache", in
Proceedings of Conferencia Latinoamérica de Inform├ítica (CLEI
PANEL'98), pp. 301-312, Quito, Ecuador, October, 1998.
[23] Y.-C. Wu and E. L. Leiss, "Program-Based Reduction of Memory Bank
Conflicts: A Software Tool", in Proceedings of Conferencia
Latinoamérica de Inform├ítica (CLEI PANEL'97), pp. 67-76, Vi├▒a del
Mar, Chile, November 1997.
[24] E. M. Robinson, "Compiler Driven I/O Management", Ph.D.
Dissertation, Department of Computer Science, University of Houston,
1998.
[25] W. Zhang, "Compiler Driven I/O Minimization", Ph. D. dissertation,
Department of Computer Science, University of Houston, 2001.
[26] X. Feng, "I/O Performance Improvement through the Use of Compiler-
Driven Memory Management", Master Thesis, Department of Houston,
University of Houston, May 2000.
[27] X. Feng, W. Zhang, and E.L. Leiss, "I/O Performance Improvement
through the Use of Compiler-Driven Memory Management" in
Proceedings of the XXVII Conferencia Latinoamericana de Information
(CLEI 2002), October 2002.
[28] W. Zhang and E. L. Leiss, "Block Mapping - A Compiler Driven I/O
Management Study", in Proceedings of the 2000 International
Conference on Parallel and Distributed Processing Techniques and
Applications (PDPTA-2000), pp. 1207-1214, Las Vegas, Nevada, June
2000.
[29] W. Zhang and E. L. Leiss, "A Compiler Driven Out-of-Core
Programming Approach for Optimizing Data Locality in Loop Nests", in
Proceedings of the 2001 International Conference on Parallel and
Distributed Processing Techniques and Applications (PDPTA-2001),
Las Vegas, Nevada, June, 2001.
[30] W. Zhang and E. L. Leiss, "Compiler-Driven I/O Minimization", CLEI
2001 - Conferencia Latinoamérica de Inform├ítica, Mérida, Venezuela,
September 2001.
[31] W. Zhang and E. L. Leiss, "Compile Time Data Transfer Analysis", 5th
Int'l Conf. on Algorithms and Architectures for Parallel Processing
(ICA3PP2002), IEEE Computer Society Press, 2002.
[32] A.C. McKellar and E. G. Coffman, Jr. "Organizing Matrices and Matrix
Operations for Paged Memory Systems", pages 153-169,
Communications of the ACM, March 1969.
[33] C.-W. Tseng, J. Anderson, M. Martonosi, and M. Hall, "Unified
Compilation Techniques for Shared and Distributed Address Space
Machines", in Proceedings of 1995 International Conference on
Supercomputing (ICS-93), Barcelona, Spain, July 1995.