Analysis of Long-Term File System Activities on Cluster Systems

I/O workload is a critical and important factor to analyze I/O pattern and to maximize file system performance. However to measure I/O workload on running distributed parallel file system is non-trivial due to collection overhead and large volume of data. In this paper, we measured and analyzed file system activities on two large-scale cluster systems which had TFlops level high performance computation resources. By comparing file system activities of 2009 with those of 2006, we analyzed the change of I/O workloads by the development of system performance and high-speed network technology.




References:
[1] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[2] PVFS web size, http://www.pvfs.org
[3] Lustre web site, http://wiki.lustre.org
[4] GPFS Wikipedia, http://en.wikipedia.org/wiki/GPFS
[5] Hyeyoung Cho, Sungho Kim and SangDong Lee, "Design and
Implementation of Shared Memory based Parallel File System Logging
Method for High Performance Computing," Volume 45, 2008.
[6] Hyeyoung Cho, Kwangho Cha and Sungho Kim, "Analysis of File
System Workloads on Hamel Cluster System," 2006 Autumn Conference,
Korea Information Processing Society, 2006.
[7] M. Satyanarayanan, "A Study of File Sizes and Functional Lifetimes," In
Proceedings of the 8th Symposium on Operating Systems Principles, pp.
96-108, 1981.
[8] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[9] Timothy J. Gibson and Ethan L. Miller, "Long-Term File Activity
Patterns in a UNIX Workstation Environment," in the Proceedings of the
15th IEEE Symposium on Mass Storage Systems, pp. 355-272, 1998.
[10] Allen B. Downey, "The structural cause of file size distributions," ACM
SIGMETRICS Performance Evaluation Review, Volume 29, pp. 328 -
329, 2001.
[11] Drew Roselli, Jacob R. Lorch,, "A comparison of file system workloads,"
USNIX, 2002.
[12] Nils Nieuwejaar , David Kotz , Apratim Purakayastha , Carla Schlatter
Ellis , Michael L. Best, "File-Access Characteristics of Parallel Scientific
Workloads," IEEE Transactions on Parallel and Distributed Systems, v.7
n.10, pp.1075-1089, October 1996.
[13] Phyllis E. CrandallRuth A. AydtAndrew A. ChienDaniel A. Reed,
"Input/Output characteristics of scalable parallel applications," in the
Proceedings of the ACM/IEEE Supercomputing conference, 1995.
[14] Evgenia Smirni and Daniel A. Reed, "Workload characterization of
input/output intensive parallel applications," In the Proceedings of the
Conference on Computer Performance Evaluation Modeling Techniques
and Tools for computer performance evaluation, Volume 1245, LNCS, pp
169-180, June 1997.
[15] Top500 Supercomputing Website, http://www.top500.org