The truth is rarely pure and never simple

Pretty queue waiting times

Whenever you run calculations on a large machine (e.g. ARCHER), your jobs are subject to queuing policies. Since the waiting time should be fair (whatever this means), these queueing policies are complex and often consist of several hundred rules. However, from a user perspective, the only interesting question is

When will my job start?

The attached scripts can help with an approximate answer. The resulting plot looks like this:

On the x-axis, you have the job size, on the y axis you have its queueing time. Queued jobs are light grey, running jobs are dark grey and you can highlight the jobs of special accounts you are interested in. So each compute job is born as a pale grey dot on the x axis and then wanders up until its execution starts, which is the moment at which it turns dark grey. Running jobs do not accrue waiting time. Here is how to get this plot (works for PBS queueing systems only):

/usr/bin/ssh username@computecluster "qstat -f" > /tmp/qf
python pbs-to-qdata.py /tmp/qf > /tmp/qfinter.txt
python qdata-to-forecast.py /tmp/qfinter.txt /path/to/output/qfcast.png acarof,gfvr2,xjiang,rasjak 2000 140

The final arguments to the last script are a comma-separated list of usernames to highlight, the maximum number of cores to show and the maximum number of hours to show.

Download it here.

IMPORTANT: Do not update this graph very frequently! qstat -f produces large output.

Leave a comment

Your email address will not be published.