Submitting CPU Jobs
From UFAL AIC
Contents
Resource specification
Monitoring and interaction
Job monitoring
We should be able to see what is going on when we run a job. Following examples shows typical usage of the command qstat
:
qstat
- this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).qstat -u '*' | less
- this shows the jobs of all users.qstat -j 121144
- this shows detailed info about the job with this number (if it is still running).
Output monitoring
If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in job_script.sh.o121144
) with:
less job_script.sh.o*
Hint: if the job is still running, press F in less
to simulate tail -f
.
How to read output epilog
The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.
======= EPILOG: Tue Jun 4 12:41:07 CEST 2019 == Limits: == Usage: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A == Duration: 00:00:00 (0 s) == Server name: cpu-node13
- Limits - on this line you can see job limits specified through
qsub
options - Usage - resource usage during computation
- cpu=HH:MM:SS - the accumulated CPU time usage
- mem=XY GB - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
- io=XY GB - the amount of data transferred in input/output operations in GB
- vmem=XY - actual virtual memory consumption when the job finished
- maxvmem=XY - peak virtual memory consumption
- Duration - total execution time
- Server name - name of the executing server