PBS/torque: unbuffered output on network filesystems

Usually, you can define an output file for stderr and stdout each in your jobfile. Even if you omit the command, the queueing system will do so. You can query the two filenames:

$ qstat -f 124762.torque
Job Id: 124762.torque
[...]
    Error_Path = filehostname.example.com:/home/user/s-6.5-0.25_
	1.e124762
[...]
    Output_Path = filehostname.example.com:/home/user/s-6.5-0.25
	_1.o124762

However, PBS copies the data only after job completion. In case you are interested in the intermediate results, you may be tempted to redirect the output of your program to a file in PBS_O_WORKDIR. While you can do this, usually this directory is on a system in your network and, hence, both slow and buffered. If this buffer is too large for your needs, you can have a look at the nodes directly. The qstat -f output also contains the node number. If you login on this node, you will find your local (and unbuffered) files in /var/spool/torque/spool (or a similar directory). The naming convention is JOBNAME.(OU|ER) for stdout and stderr, respectively.

You may also like...

Leave a Reply

Your email address will not be published.