Submitting CPU Jobs

From UFAL AIC
Revision as of 15:10, 11 June 2019 by Vodrazka (talk | contribs) (How to read output epilog)

Resource specification

Monitoring and interaction

Job monitoring

We should be able to see what is going on when we run a job. Following examples shows typical usage of the command qstat:

  • qstat - this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).
  • qstat -u '*' | less - this shows the jobs of all users.
  • qstat -j 121144 - this shows detailed info about the job with this number (if it is still running).

Output monitoring

If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in job_script.sh.o121144) with:
less job_script.sh.o*
Hint: if the job is still running, press F in less to simulate tail -f.

How to read output epilog

The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.

======= EPILOG: Tue Jun 4 12:41:07 CEST 2019
== Limits:   
== Usage:    cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A
== Duration: 00:00:00 (0 s)
== Server name: cpu-node13
  • Limits - on this line you can see job limits specified through qsub options
  • Usage - resource usage during computation
    • cpu=HH:MM:SS - the accumulated CPU time usage
    • mem=XY GB - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
    • io=XY GB - the amount of data transferred in input/output operations in GB
    • vmem=XY - actual virtual memory consumption when the job finished
    • maxvmem=XY - peak virtual memory consumption
  • Duration - total execution time
  • Server name - name of the executing server

Output

Logs