PBS/Torque and file size requests

Sometimes jobs need some local disk space for temporary data that should neither be part of the backup nor too slow to write huge amounts of data to. A common way to help users out is to define some scratch directory. Using PBS/torque, you can request access to this local storage by adding

#PBS -l file=30GB

to your jobscript. Unfortunately, this is not meant to be the limit for your whole job but for a single core of your job (which is reasonable in case your job spans multiple nodes). However, access control is achieved by setting the ulimit of your process tree to this value. So in case you want to use all the disk space from one process, this will fail as the file handle does not accept further write calls beyond the limit you defined above. For small disk quota requests, raising the requested storage will be a reasonable strategy. For larger requests, there may not be sufficient storage to fulfill this request, as PBS/torque will still look for nodes that offer (number of cores)*(scratch space requested) disk space.

The workaround

Either talk to your admin to make the local scratch directory world-writable, as the ulimit default for the file size is unlimited or split your temporary files into parts. Using the default setup of PBS/torque does not create a user-writable directory without any file request, as shown above.

An example: Gaussian

For Gaussian09, there is a convenient way to split the temporary files into smaller chunks. In particular, the rwf file (that can be seen as a file-based swap partition) can be even stored on different file systems. As the manual says, you can specify multiple files by using something like

%RWF=XXX/A,30GB,XXX/B,30GB,XXX/C,30GB,XXX/D,30GB,XXX/E,30GB,XXX/F,30GB

In this case, the files A.rwf to F.rwf are created in XXX and grow to at most 30 GB each. For my setup, the local scratch directory contains the job id which is not known before job submission. Hence, you have to add something along the lines of

sed "s#XXX#$GAUSS_SCRDIR#g" gaussianinput.tmp > gaussianinput.gau

to your job script. The hashs are used as separator so you can safely use nested paths for GAUSS_SCRDIR.

The error messages

For those who want to check whether this solution fits their problems before touching any scripts, here are the error messages I got.

On stderr

File size limit exceeded

Via strace -f

26517 open("/local_scratch/526755.torque.physik.fu-berlin.de/Gau-26517.rwf", O_RDWR) = 4
[...]
26517 fstatfs(4, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=111613318, f_bfree=111525871, f_bavail=105856232, f_files=28352512, f_ffree=28352491, f_fsid={1519002992, -214470734}, f_namelen=255, f_frsize=4096}) = 0
26517 lseek(4, 1073737728, SEEK_END)    = 35625353216
26517 write(4, "\240\217\255\0\0\0\0\0\227\217\255\0\0\0\0\0\0\0\0\0\0\0\0\0)\0\0\0\0\0\0\0"..., 4096) = -1 EFBIG (File too large)
26517 --- SIGXFSZ (File size limit exceeded) @ 0 (0) ---
30043 <... ???? resumed> )              = ? <unavailable>
30042 +++ killed by SIGXFSZ +++
30041 +++ killed by SIGXFSZ +++
30040 +++ killed by SIGXFSZ +++
30039 +++ killed by SIGXFSZ +++
30043 +++ killed by SIGXFSZ +++
30038 +++ killed by SIGXFSZ +++
30037 +++ killed by SIGXFSZ +++
30036 +++ killed by SIGXFSZ +++
30035 +++ killed by SIGXFSZ +++
30034 +++ killed by SIGXFSZ +++
30033 +++ killed by SIGXFSZ +++
26516 <... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGXFSZ}], 0, NULL) = 26517
26516 --- SIGCHLD (Child exited) @ 0 (0) ---
26516 write(2, "File size limit exceeded\n", 25) = 25
26516 exit_group(153)                   = ?
26515 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 153}], 0, NULL) = 26516
26515 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x430e60}, NULL, 8) = 0
26515 rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x430e60}, NULL, 8) = 0
26515 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
26515 --- SIGCHLD (Child exited) @ 0 (0) ---
26515 stat("/local_scratch/526755.torque.physik.fu-berlin.de/Gau-26515.inp", {st_mode=S_IFREG|0644, st_size=4297, ...}) = 0
26515 unlink("/local_scratch/526755.torque.physik.fu-berlin.de/Gau-26515.inp") = 0
26515 exit_group(1)                     = ?

In case you submit a job whose requirements are larger than the capabilities of all nodes in the cluster

$ showstart 526758.torque
ERROR:    'showstart' failed
ERROR:    cannot determine earliest start time for job '526758'

The workaround

An example: Gaussian

The error messages

Leave a comment Cancel reply