Design and Implementation of Shared Memory based Parallel File System Logging Method for High Performance Computing

I/O workload is a critical and important factor to analyze I/O pattern and file system performance. However tracing I/O operations on the fly distributed parallel file system is non-trivial due to collection overhead and a large volume of data. In this paper, we design and implement a parallel file system logging method for high performance computing using shared memory-based multi-layer scheme. It minimizes the overhead with reduced logging operation response time and provides efficient post-processing scheme through shared memory. Separated logging server can collect sequential logs from multiple clients in a cluster through packet communication. Implementation and evaluation result shows low overhead and high scalability of this architecture for high performance parallel logging analysis.




References:
[1] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, " A Trace-Driven Analysis of
the UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems
Review archive, Volume 19, Issue 5, pp. 15~24, 1985.
[2] Drew Roselli, Jacob R. Lorch, and Thomas E. Anderson, "A comparison
of file system workloads," Proc. of USENIX Annual Technical
Conference, pp. 41~54, 2000.
[3] Akshat Aranya, Charles P. Wright, and Erez Zadok, "Tracefs: A File
System to Trace Them All," FAST 2004.
[4] Pin Lu and Kai Shen, "Multi-Layer Event Trace Analysis for Parallel I/O
Performance Tuning," Proceedings of the 2007 International Conference
on Parallel Processing, 2007.
[5] Anthony Chan, William Gropp, and Ewing Lusk, "User-s Guide for
MPE: Extensions for MPI Programs", from MPICH2 web site,
http://www.mcs.anl.gov/research/projects/mpich2
[6] MPICH2 web site, http://www.mcs.anl.gov /research/projects/mpich2
[7] Rajeev Thakur, William Gropp, Ewing Lusk, "On Implementing MPI-IO
Portably and with High Performance," In Proceedings of the 6th
Workshop on I/O in Parallel and Distributed Systems, pp. 23-32, 1999.
[8] PVFS web size, http://www.pvfs.org
[9] Lustre web site, http://wiki.lustre.org
[10] GPFS Wikipedia, http://en.wikipedia.org/wiki/GPFS
[11] Chris Ruemmler and John Wilkes, "A trace-driven analysis of disk
working set sizes", Technical Report HPL-OSR-93-23, Hewlett-
Packard Laboratories, April 1993.
[12] sourceforge starce home page, http://sourceforge.net/projects/strace/
[13] Akshat Aranya, Charles P. Wright, Erez Zadok, "Tracefs: A File System
to Trace Them All," Proceedings of the 3rd USENIX Conference on File
and Storage Technologies, pp. 129 - 145, 2004.
[14] L. Mummert, M. Satyanarayanan, "Long term distributed file reference
tracing: Implementation and experience," SoftwareÔÇöPractice &
Experience, Volume 26, Issue 6, pp. 705-736, 1996.
[15] tcpdump/libcap homepage, http://www.tcpdump.org/
[16] M. Blaze, "NFS Tracing by Passive Network Monitoring, " In
Proceedings of the USENIX Winter Conference, January 1992.
[17] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. Passive NFS Tracing of
Email and Research Workloads. In Proceedings of the Annual USENIX
Conference on File and Storage Technologies, March 2003.
[18] Andrew W. Leung, Shankar Pasupathy, Garth Goodson, Ethan L. Miller
"Measurement and Analysis of Large-Scale Network File System
Workloads," In the proceedings of the 2008 USENIX Annual Technical
Conference, June 2008.
[19] Ibrahim F. Haddad inSysAdmin, "PVFS: A Parallel Virtual File System
for Linux Cluster," Linux Journal, 2000.
[20] Mohan Rajagopalan, Matti Hiltunen, Trevor Jim, and Richard
Schlichting, "Authenticated System Calls," DSN-2005: The
International Conference on `Dependable Systems and Networks, June
2005.
[21] Intel web site, http://software.intel.com/en-us/forums/watercoolercatchall
/topic/54276
[22] iozone web site, http://www.iozone.org/