About
Eligibility
Project Descriptions
Financial Info
Key Dates
How To Apply

FaST Logo

Faculty and Student Teams Program

questioning Project Descriptions

Los Alamos National Laboratory
SciDAC Petascale Data Storage Institute Outreach: Exploiting Parallel I/O Traces

Background/Motivation

The need for increasing scale in scientific computation drives the need for rapidly increasing scale in storage capability for High Performance Computing (HPC). Additionally, the rapid acceptance of Data Intensive Super Computing (DISC) also drives the desire for high performance and highly parallel storage deployment.  Individual disk storage devices are rapidly getting denser while their bandwidth and agility is not growing at the same pace, which makes the job of providing scalable storage solutions extremely difficult as time goes on.

Storage systems for HPC and DISC environments routinely exceed 50,000 disk drives involved in one parallel job.  The desire to be able to efficiently utilize the massive resources involved in these tasks is great.  While disks are somewhat agile for performing random operations, they are far more efficient at more serial I/O workloads.  HPC applications due to their massive parallelism in machines approaching one million processing cores offer a wide variety of I/O workloads for the disk storage to deal with, from simple serial efficient patterns to highly random.  Additionally, DISC machines offer similar workload variety.

To assist I/O and storage researchers to better build storage systems to cope with these demanding workloads in an efficient manner, it is vital that the I/O workloads be understood.  One of the most important tools for understanding I/O workloads is tracing.  Accurate low overhead traces are needed as researchers do not always have access to run the applications or run them at scale.  Over the last two years, the DOE SciDAC Petascale Data Storage Institute (PDSI) has embarked on an effort to collect I/O traces of many DOE Office of Science and NNSA parallel scientific applications.  These traces are being made available to the computer science research community.  In order to utilize these I/O traces from important DOE supercomputing applications, trace analysis tools must be available.  This proposal focuses on providing information and tools for helping the computer science research community utilize these valuable traces by surveying trace tools for use in analyzing parallel I/O traces and creation of a parallel trace replay tool which could be used to replay I/O patterns of these important applications on parallel machines without having to run the actual scientific applications.

Proposed Project

We propose a multi-track project to survey and analyze existing trace analysis tools for use with parallel I/O traces and create a parallel trace replay tool for replaying traces without having access to the original application.  The project would be tied to and collaborate with the SciDAC PDSI at LANL and with the other PDSI institutions.  Work space and access to computers, traces, and tools would be made available by LANL.  LANL High Performance Computing I/O and File Systems experts, members of the PDSI, will work closely with the FaST team to understand the project tasks and collaborate on approaches.  The LANL mentors will also help run tests of tools created by the FaST team on LANL’s large supercomputers.  It is expected that a large amount of graphical and movie output from visual analysis tools in surveying the trace tools would be produced which would be suitable for publication.  We would expect that both project tracts would be in a position to publish a technical report at a minimum and perhaps a workshop or conference paper from the work on these two subprojects.  Additionally, LANL would include the FaST team in student programs functions at LANL which includes tours of LANL facilities and supercomputing center, seminars given by LANL scientist on a variety of topics, and functions designed to allow students and faculty to learn more about DOE and LANL programs and science.

Trace Tool Survey

The Trace Tool Survey subproject would:

  • Collect and index a large variety of pointers to traces available via PDSI and other HPC related I/O tracing efforts
  • Collect, index, and survey a variety of parallel tracing analysis tools including visual and movie relate tools from a variety of sources from the parallel computing community
  • Use the trace tools with the traces to produce a large volume of visual output and probe features of these tools and their usefulness to I/O researchers.  This work would be done collaboratively with LANL I/O experts.  These outputs would be made available to the computer science research community
  • Publish a report documenting the tool survey and example outputs

Trace Replay Tool

The Trace Replay Tool subproject would:

  • Specify requirements for a parallel I/O trace replay tool
  • Prototype and test the replay tool
  • Run the replay tool on real traces and produce new traces
  • Compare the original traces to the replayed traces using trace analysis tools provided by the Trace Tools Survey subproject to determine the accuracy and fidelity of the replay tool.

Skills Required:

Knowledge of C programming
Ability to use Linux
Ability to learn and use graphical tools
Ability to utilize Linux formatting and scripting utilities and shell languages
Some minimal familiarity with parallel programming especially using the MPI (Message Passing Interface)

Support and Financial Commitments

See Financial Information.