Difference between revisions of "Submitting CPU Jobs"

From UFAL AIC
(How to read output epilog)
(How to read output epilog)
Line 27: Line 27:
 
** ''mem=XY GB'' - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
 
** ''mem=XY GB'' - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
 
** ''io=XY GB'' - the amount of data transferred in input/output operations in GB
 
** ''io=XY GB'' - the amount of data transferred in input/output operations in GB
** ''vmem=XY''
+
** ''vmem=XY'' - actual virtual memory consumption when the job finished
** ''maxvmem=XY''
+
** ''maxvmem=XY'' - peak virtual memory consumption
* ''Duration''
+
* ''Duration'' - total execution time
* ''Server name''
+
* ''Server name'' - name of the executing server
  
 
== Output ==
 
== Output ==
 
== Logs ==
 
== Logs ==

Revision as of 16:10, 11 June 2019

Resource specification

Monitoring and interaction

Job monitoring

We should be able to see what is going on when we run a job. Following examples shows typical usage of the command qstat:

  • qstat - this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).
  • qstat -u '*' | less - this shows the jobs of all users.
  • qstat -j 121144 - this shows detailed info about the job with this number (if it is still running).

Output monitoring

If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in job_script.sh.o121144) with:
less job_script.sh.o*
Hint: if the job is still running, press F in less to simulate tail -f.

How to read output epilog

The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.

======= EPILOG: Tue Jun 4 12:41:07 CEST 2019
== Limits:   
== Usage:    cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A
== Duration: 00:00:00 (0 s)
== Server name: cpu-node13
  • Limits - on this line you can see job limits specified through qsub options
  • Usage - resource usage during computation
    • cpu=HH:MM:SS - the accumulated CPU time usage
    • mem=XY GB - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
    • io=XY GB - the amount of data transferred in input/output operations in GB
    • vmem=XY - actual virtual memory consumption when the job finished
    • maxvmem=XY - peak virtual memory consumption
  • Duration - total execution time
  • Server name - name of the executing server

Output

Logs