Analysis of Long-Term File System Activities on Cluster Systems
I/O workload is a critical and important factor to
analyze I/O pattern and to maximize file system performance.
However to measure I/O workload on running distributed parallel file
system is non-trivial due to collection overhead and large volume of
data. In this paper, we measured and analyzed file system activities on
two large-scale cluster systems which had TFlops level high
performance computation resources. By comparing file system
activities of 2009 with those of 2006, we analyzed the change of I/O
workloads by the development of system performance and high-speed
network technology.
[1] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[2] PVFS web size, http://www.pvfs.org
[3] Lustre web site, http://wiki.lustre.org
[4] GPFS Wikipedia, http://en.wikipedia.org/wiki/GPFS
[5] Hyeyoung Cho, Sungho Kim and SangDong Lee, "Design and
Implementation of Shared Memory based Parallel File System Logging
Method for High Performance Computing," Volume 45, 2008.
[6] Hyeyoung Cho, Kwangho Cha and Sungho Kim, "Analysis of File
System Workloads on Hamel Cluster System," 2006 Autumn Conference,
Korea Information Processing Society, 2006.
[7] M. Satyanarayanan, "A Study of File Sizes and Functional Lifetimes," In
Proceedings of the 8th Symposium on Operating Systems Principles, pp.
96-108, 1981.
[8] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[9] Timothy J. Gibson and Ethan L. Miller, "Long-Term File Activity
Patterns in a UNIX Workstation Environment," in the Proceedings of the
15th IEEE Symposium on Mass Storage Systems, pp. 355-272, 1998.
[10] Allen B. Downey, "The structural cause of file size distributions," ACM
SIGMETRICS Performance Evaluation Review, Volume 29, pp. 328 -
329, 2001.
[11] Drew Roselli, Jacob R. Lorch,, "A comparison of file system workloads,"
USNIX, 2002.
[12] Nils Nieuwejaar , David Kotz , Apratim Purakayastha , Carla Schlatter
Ellis , Michael L. Best, "File-Access Characteristics of Parallel Scientific
Workloads," IEEE Transactions on Parallel and Distributed Systems, v.7
n.10, pp.1075-1089, October 1996.
[13] Phyllis E. CrandallRuth A. AydtAndrew A. ChienDaniel A. Reed,
"Input/Output characteristics of scalable parallel applications," in the
Proceedings of the ACM/IEEE Supercomputing conference, 1995.
[14] Evgenia Smirni and Daniel A. Reed, "Workload characterization of
input/output intensive parallel applications," In the Proceedings of the
Conference on Computer Performance Evaluation Modeling Techniques
and Tools for computer performance evaluation, Volume 1245, LNCS, pp
169-180, June 1997.
[15] Top500 Supercomputing Website, http://www.top500.org
[1] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[2] PVFS web size, http://www.pvfs.org
[3] Lustre web site, http://wiki.lustre.org
[4] GPFS Wikipedia, http://en.wikipedia.org/wiki/GPFS
[5] Hyeyoung Cho, Sungho Kim and SangDong Lee, "Design and
Implementation of Shared Memory based Parallel File System Logging
Method for High Performance Computing," Volume 45, 2008.
[6] Hyeyoung Cho, Kwangho Cha and Sungho Kim, "Analysis of File
System Workloads on Hamel Cluster System," 2006 Autumn Conference,
Korea Information Processing Society, 2006.
[7] M. Satyanarayanan, "A Study of File Sizes and Functional Lifetimes," In
Proceedings of the 8th Symposium on Operating Systems Principles, pp.
96-108, 1981.
[8] John K. Ousterhout, Hervg Da Costa, David Harrison, John A. Kunze,
Mike Kupfer, and James G. Thompson, "A Trace-Driven Analysis of the
UNIX 4.2 BSD File System," ACM SIGOPS Operating Systems Review
archive, Volume 19, Issue 5, pp. 15~24, 1985.
[9] Timothy J. Gibson and Ethan L. Miller, "Long-Term File Activity
Patterns in a UNIX Workstation Environment," in the Proceedings of the
15th IEEE Symposium on Mass Storage Systems, pp. 355-272, 1998.
[10] Allen B. Downey, "The structural cause of file size distributions," ACM
SIGMETRICS Performance Evaluation Review, Volume 29, pp. 328 -
329, 2001.
[11] Drew Roselli, Jacob R. Lorch,, "A comparison of file system workloads,"
USNIX, 2002.
[12] Nils Nieuwejaar , David Kotz , Apratim Purakayastha , Carla Schlatter
Ellis , Michael L. Best, "File-Access Characteristics of Parallel Scientific
Workloads," IEEE Transactions on Parallel and Distributed Systems, v.7
n.10, pp.1075-1089, October 1996.
[13] Phyllis E. CrandallRuth A. AydtAndrew A. ChienDaniel A. Reed,
"Input/Output characteristics of scalable parallel applications," in the
Proceedings of the ACM/IEEE Supercomputing conference, 1995.
[14] Evgenia Smirni and Daniel A. Reed, "Workload characterization of
input/output intensive parallel applications," In the Proceedings of the
Conference on Computer Performance Evaluation Modeling Techniques
and Tools for computer performance evaluation, Volume 1245, LNCS, pp
169-180, June 1997.
[15] Top500 Supercomputing Website, http://www.top500.org
@article{"International Journal of Information, Control and Computer Sciences:60715", author = "Hyeyoung Cho and Sungho Kim and Sik Lee", title = "Analysis of Long-Term File System Activities on Cluster Systems", abstract = "I/O workload is a critical and important factor to
analyze I/O pattern and to maximize file system performance.
However to measure I/O workload on running distributed parallel file
system is non-trivial due to collection overhead and large volume of
data. In this paper, we measured and analyzed file system activities on
two large-scale cluster systems which had TFlops level high
performance computation resources. By comparing file system
activities of 2009 with those of 2006, we analyzed the change of I/O
workloads by the development of system performance and high-speed
network technology.", keywords = "I/O workload, Lustre, GPFS, Cluster File System", volume = "3", number = "12", pages = "2883-6", }