Submitting CPU Jobs
Contents
Resource specification
Monitoring and interaction
Job monitoring
We should be able to see what is going on when we run a job. Following examples shows typical usage of the command qstat
:
qstat
- this way we inspect all our jobs (both waiting in the queue and scheduled, i.e. running).qstat -u '*' | less
- this shows the jobs of all users.qstat -j 121144
- this shows detailed info about the job with this number (if it is still running).
Output monitoring
If we need to see output produced by our job (suppose the ID is 121144), we can inspect the job's output (in our case stored in job_script.sh.o121144
) with:
less job_script.sh.o*
Hint: if the job is still running, press F in less
to simulate tail -f
.
How to read output epilog
The epilog section contains some interesting pieces of information. However this it can get confusing sometimes.
======= EPILOG: Tue Jun 4 12:41:07 CEST 2019 == Limits: == Usage: cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A == Duration: 00:00:00 (0 s) == Server name: cpu-node13
- Limits - on this line you can see job limits specified through
qsub
options - Usage - resource usage during computation
- cpu=HH:MM:SS - the accumulated CPU time usage
- mem=XY GB - gigabytes of RAM used times the duration of the job in seconds, so don't be afraid XY is usually a very high number (unlike in this toy example)
- io=XY GB - the amount of data transferred in input/output operations in GB
- vmem=XY - actual virtual memory consumption when the job finished
- maxvmem=XY - peak virtual memory consumption
- Duration - total execution time
- Server name - name of the executing server
Advanced usage
qsub -q cpu.q
This way your job is submitted to the CPU queue which is the default. If you need GPU use gpu.q
instead.
qsub -l ...
See man complex
(run it on aic) for a list of possible resources you may require (in addition to mem_free
etc. discussed above).
qsub -p -200
Define a priority of your job as a number between -1024 and 0. Only SGE admins may use a number higher than 0. Default is set to TODO. You should ask for lower priority (-1024..-101) if you submit many jobs at once or if the jobs are not urgent. SGE uses the priority to decide when to start which pending job in the queue (it computes a real number called prior
, which is reported in qstat
, which grows as the job is waiting in the queue). Note that once a job is started, you cannot unschedule it, so from that moment on, it is irrelevant what was its priority.
qsub -o** LOG.stdout **-e LOG.stderr
redirect std{out,err} to separate files with given names, instead of the defaults $JOB_NAME.o$JOB_ID
and $JOB_NAME.e$JOB_ID
.
qsub -@ optionfile
Instead of specifying all the qsub
options on the command line, you can store them in a file (you can use # comments in the file). See also #In-script options.
qsub -a 12312359
Execute your job no sooner than at the given time (in [YY]MMDDhhmm format). An alternative to sleep 3600 && qsub ... &
.
qsub -b y
Treat script.sh (or whatever is the name of the command you execute) as a binary, i.e. don't search for #in-script options within the file, don't transfer it to the qmaster and then to the execution node. This makes the execution a bit faster and it may prevent some rare but hard-to-detect errors caused SGE interpreting the script. The script must be available on the execution node via NFS, Lustre (which is our case), etc. With -b y (shortcut for -b yes), script.sh can be a script or a binary. With -b n (which is the default for qsub), script.sh
must be a script (text file).
qsub -M** popel@ufal.mff.cuni.cz,rosa@ufal.mff.cuni.cz **-m beas
Specify the emails where you want to be notified when the job has been b** started, **e** ended, **a** aborted or rescheduled, **s suspended.
The default is now -m a and the default email address is forwarded to you (so there is no need to use -M). You can use -m n
to override the defaults and send no emails.
qsub -hold_jid** 121144,121145 (or qsub **-hold_jid get_src.sh,get_tgt.sh
)
The current job is not executed before all the specified jobs are completed.
qsub -now y
Start the job immediately or not at all, i.e. don't put it as pending to the queue. This is the default for qrsh, but you can change it with -now n (which is the default for qsub
).
qsub -N my-name
By default the name of a job (which you can see e.g. in qstat) is the name of the script.sh
. This way you can override it.
qsub -S /bin/bash
The hashbang (!#/bin/bash) in your script.sh is ignored, but you can change the interpreter with -S. I think /bin/bash is now (2017/09) the default (but it used to be csh
).
qsub -v PATH[=value]
Export a given environment variable from the current shell to the job.
qsub -V
Export all environment variables. (This is not so needed now, when bash is the default interpreter and it seems your ~/.bashrc
is always sourced.)
qsub -soft** -l ... **-hard -l ... -q ...
By default, all the resource requirements (specified with -l) and queue requirements (specified with -q) are //hard//, i.e. your job won't be scheduled unless they can be fulfilled. You can use -soft to mark all following requirements as nice-to-have. And with -hard
you can switch back to hard requirements.
qsub -sync y
This causes qsub to wait for the job to complete before exiting (with the same exit code as the job). Useful in scripts.
qalter
You can change some properties of already submitted jobs (both waiting in the queue and running). Changeable properties are listed in man qsub
.