Difference between revisions of "Submitting CPU Jobs"

From UFAL AIC
(Basic categories)
(Job monitoring)
Line 1: Line 1:
 
== Resource specification ==
 
== Resource specification ==
 +
== Monitoring and interaction ==
 +
 
== Job monitoring ==
 
== Job monitoring ==
 +
We should be able to see what is going on when we run a job. Following examples shows typical usage of the command <code>qstat</code>:
 +
* <code>qstat</code> - this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).
 +
* <code>qstat -u '*' | less</code> - this shows the jobs of all users.
 +
* <code>qstat -j 121144</code> - this shows detailed info about the job with this number (if it is still running).
 +
 +
== Output monitoring ==
 +
If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in <code>job_script.sh.o121144</code>) with:<br>
 +
<code>less job_script.sh.o*</code><br>
 +
''Hint:'' if the job is still running, press '''F''' in <code>less</code> to simulate <code>tail -f</code>.
 +
 +
=== How to read output epilog ===
 +
The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.
 +
 +
======= EPILOG: Tue Jun 4 12:41:07 CEST 2019
 +
== Limits: 
 +
== Usage:    cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A
 +
== Duration: 00:00:00 (0 s)
 +
== Server name: cpu-node13
 +
 +
* ''Limits'' - on this line you can see job limits specified through <code>qsub</code> options
 +
* ''Usage'' - resource usage during computation
 +
** ''cpu=HH:MM:SS'' - processor time
 +
** ''mem=XY GB'' - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
 +
** ''io=XY GB'' - amount of data read and written by the job
 +
** ''vmem=XY''
 +
** ''maxvmem=XY''
 +
* ''Duration''
 +
* ''Server name''
 +
 
== Output ==
 
== Output ==
 
== Logs ==
 
== Logs ==

Revision as of 10:52, 5 June 2019

Resource specification

Monitoring and interaction

Job monitoring

We should be able to see what is going on when we run a job. Following examples shows typical usage of the command qstat:

  • qstat - this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).
  • qstat -u '*' | less - this shows the jobs of all users.
  • qstat -j 121144 - this shows detailed info about the job with this number (if it is still running).

Output monitoring

If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in job_script.sh.o121144) with:
less job_script.sh.o*
Hint: if the job is still running, press F in less to simulate tail -f.

How to read output epilog

The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.

======= EPILOG: Tue Jun 4 12:41:07 CEST 2019
== Limits:   
== Usage:    cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A
== Duration: 00:00:00 (0 s)
== Server name: cpu-node13
  • Limits - on this line you can see job limits specified through qsub options
  • Usage - resource usage during computation
    • cpu=HH:MM:SS - processor time
    • mem=XY GB - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
    • io=XY GB - amount of data read and written by the job
    • vmem=XY
    • maxvmem=XY
  • Duration
  • Server name

Output

Logs