The scratch space on NICS resources utilizes Lustre, a massively parallel distributed file system. This space is intended for production work and not long term storage. Files in scratch directories are not backed up and are subject to being purged after 30 days. It is the user's responsibility to back up all important data.
Compute nodes can see only the Lustre scratch directories.
Batch scripts are run on service nodes that have access to the home, project
and software directories. Executables launched with the
command run on compute nodes and do not
have access to these directories; they can only access the Lustre scratch directories. Therefore, in your batch script, make sure to
cd to the
Lustre scratch directory before the
aprun command is issued.
If this is not done, you may see an error like:
aprun: [NID 94]Exec /lustre/medusa/userid/a.out failed: chdir /nics/b/home/userid No such file or directory
For the program launched by
aprun, all input and output files must reside in the Lustre scratch directories.
Lustre scratch space can be found in the following locations on NICS resources:
The scratch file system should not be used for long term storage, and files on scratch are not backed up or guaranteed by NICS. In the event of a file system crash or purge, files in scratch directories cannot be recovered. It is the user's responsibility to back up all important data.
Files are exempt from purge if they have been written to or read within the last 30 days. To find out if files will be purged you can use:
lfs find $SCRATCHDIR -atime +30 | xargs ls -l --time=atime --sort=time
Modifying file access times (using
touch or any other method) for the purpose of circumventing purge policies may result in the loss of access to the scratch file systems. Under special circumstances, users may request a purge exemption by submitting a request in a timely manner that includes detailed justification to firstname.lastname@example.org. Please include file system (e.g.
/lustre/snx), PI of the project, user requesting exemption, TG-Account, time requested (e.g. two weeks), and detailed justification.
I/O and Lustre Usage
Lustre is a shared resource by all users on the system. Optimizing your IO performance will not only lessen the load on Lustre, it will save you compute time as well. Please consider reading the I/O and Lustre Usage page which we believe will help you make the best use of the parallel Lustre filesystem and improve your application's I/O performance.
Also, Frequently Asked Questions about Lustre are available on the Lustre FAQs page.