Submitting GPU Jobs

Start by reading Submitting CPU Jobs page.

The GPU jobs are submitted to gpu partition.

To ask for one GPU card, use #SBATCH -G 1 directive or -G 1 option on the command line. The submitted job has CUDA_VISIBLE_DEVICES set appropriately, so all CUDA applications should use only the allocated GPUs.

Rules

Always use GPUs via sbatch (or srun), never via ssh. You can ssh to any machine e.g. to run nvidia-smi or htop, but not to start computing on GPU.
Don't forget to specify you RAM requirements with e.g. --mem=10G.
Always specify the number of GPU cards (e.g. -G 1). Thus e.g. srun -p gpu --mem=64G -G 2 --pty bash
For interactive jobs, you can use srun, but make sure to end your job as soon as you don't need the GPU (so don't use srun for long training).
In general: don't reserve a GPU (as described above) without actually using it for longer time, e.g., try separating steps which need GPU and steps which do not and execute those separately on our GPU resp. CPU cluster.
If you know an approximate runtime of your job, please specify it with -t . Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

CUDA and cuDNN

Available CUDA versions are in

/lnet/aic/opt/cuda/

and as of Oct 2025, available versions are 10.1, 10.2, 11.2, 11.7, 11.8, 12.4

The cuDNN library is also available in the subdirectory cudnn/VERSION/lib64 of the respective CUDA directories.

Therefore, to use CUDA 11.2 with cuDNN 8.1.1, you should add the following to your .profile:

export PATH="/lnet/aic/opt/cuda/cuda-11.2/bin:$PATH"
export LD_LIBRARY_PATH="/lnet/aic/opt/cuda/cuda-11.2/lib64:/lnet/aic/opt/cuda/cuda-11.2/cudnn/8.1.1/lib64:/lnet/aic/opt/cuda/cuda-11.2/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/lnet/aic/opt/cuda/cuda-11.2 # XLA configuration if you are using TensorFlow

CUDA modules

CUDA 11.2 and later can be also loaded as modules. This will set various environment variables for you so you should be able to use CUDA easily. You can use this feature if you add the following code to your .bashrc:

if [ -f /etc/profile.d/modules.sh ]; then
  source /etc/profile.d/modules.sh
fi

On a GPU node, you can control modules with the following commands:

list available modules with: module avail
load the version you need (possibly specifying the version of CuDNN): module load <modulename>
you can unload the module with: module unload <modulename>

As of Oct 2025, the available modules are

cuda/11.2
cuda/11.2-cudnn8.1
cuda/11.7
cuda/11.7-cudnn8.5
cuda/11.8
cuda/11.8-cudnn8.5
cuda/11.8-cudnn8.6
cuda/11.8-cudnn8.9
cuda/12.4
cuda/12.4-cudnn9.14

List of installed GPUs

Available GPU table

Node name	GPU type	GPU RAM size (GB)	GPU count on node	SLURM features
gpu-node1	NVIDIA RTX A4000	16	8	gpuram16G, gpu_cc8.6
gpu-node1	NVIDIA GeForce RTX 3090	24	1	gpuram24G, gpu_cc8.6
gpu-node2	NVIDIA GeForce RTX 2080 Ti	11	6	gpuram11G, gpu_cc7.5
gpu-node3	NVIDIA L4	22	2	gpuram22G, gpu_cc8.9
gpu-node3	NVIDIA A16	15	16	gpuram15G, gpu_cc8.6

Anonymous

Search

Submitting GPU Jobs

Namespaces

More

Page actions

Contents

Rules

CUDA and cuDNN

CUDA modules

List of installed GPUs

Available GPU table

Navigation

Navigation

MediaWiki

Wiki tools

Wiki tools

Anonymous

Search

Submitting GPU Jobs

Contents

Rules

CUDA and cuDNN

CUDA modules

List of installed GPUs

Available GPU table

Navigation

Wiki tools

Page tools