Difference between revisions of "Main Page"
m (Removed link to original documentation) |
m |
||
(13 intermediate revisions by 2 users not shown) | |||
Line 5: | Line 5: | ||
== Welcome to AIC == | == Welcome to AIC == | ||
− | AIC (Artificial Intelligence Cluster) is a computational grid with sufficient computational capacity for research in the field of [https://en.wikipedia.org/wiki/Deep_learning deep learning] using both CPU and GPU. It was built on top of [https:// | + | AIC (Artificial Intelligence Cluster) is a computational grid with sufficient computational capacity for research in the field of [https://en.wikipedia.org/wiki/Deep_learning deep learning] using both CPU and GPU. It was built on top of [https://slurm.schedmd.com/ SLURM] scheduling system. MFF students of Bc. and Mgr. degrees can use it to run their experiments and learn the proper ways of grid computing in the process. |
=== Access === | === Access === | ||
AIC is dedicated to UFAL students who will get an account if requested by authorized lector. | AIC is dedicated to UFAL students who will get an account if requested by authorized lector. | ||
+ | |||
+ | To change the password, this link is available: https://aic.ufal.mff.cuni.cz/pw-manager | ||
+ | |||
+ | There is a restriction on resources allocated by one user in group '''students''' at a given time. | ||
+ | By default, this is set to a maximum of 4 CPU and 1 GPU. | ||
+ | |||
+ | === Jupyterlab === | ||
+ | AIC provides also Jupyterlab portal on top of your AIC account and HOME directory. It can be found at https://aic.ufal.mff.cuni.cz/jlab . Pre-installed extensions: R, ipython, Rstudio (community), Slurm Queue Manager. | ||
+ | |||
+ | === Connecting to the Cluster (directly) === | ||
+ | Use SSH to connect to the cluster: | ||
+ | ssh LOGIN@aic.ufal.mff.cuni.cz | ||
=== Basic HOWTO === | === Basic HOWTO === | ||
Line 14: | Line 26: | ||
Following HOWTO is meant to provide only a simplified overview of the cluster usage. It is strongly recommended to read some further documentation ([[Submitting_CPU_Jobs|CPU]] or [[Submitting_GPU_Jobs|GPU]]) before running some serious experiments. | Following HOWTO is meant to provide only a simplified overview of the cluster usage. It is strongly recommended to read some further documentation ([[Submitting_CPU_Jobs|CPU]] or [[Submitting_GPU_Jobs|GPU]]) before running some serious experiments. | ||
More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your [[Quotas|quota]] is not exceeded. | More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your [[Quotas|quota]] is not exceeded. | ||
+ | |||
+ | '''Rule 0: NEVER RUN JOBS DIRECTLY ON aic.ufal.mff.cuni.cz HEADNODE. Use <code>srun</code> to get computational node shell!''' | ||
Suppose we want to run some computations described by a script called <code>job_script.sh</code>: | Suppose we want to run some computations described by a script called <code>job_script.sh</code>: | ||
#!/bin/bash | #!/bin/bash | ||
− | + | #SBATCH -J helloWorld # name of job | |
− | + | #SBATCH -p cpu # name of partition or queue (default is cpu) | |
− | echo " | + | #SBATCH -o helloWorld.out # name of output file for this submission script |
+ | #SBATCH -e helloWorld.err # name of error file for this submission script | ||
+ | # run my job (some executable) | ||
+ | sleep 5 | ||
+ | echo "Hello I am running on cluster!" | ||
+ | We need to ''submit'' the job to the cluster which is done by logging on the submit host <code>aic.ufal.mff.cuni.cz</code> and issuing the command:<br> | ||
+ | <code>sbatch job_script.sh</code> | ||
− | + | This will enqueue our ''job'' to the default ''partition'' (or ''queue'') which is <code>cpu</code>. The scheduler decides which particular machine in the specified queue has ''resources'' needed to run the job. Typically we will see a message which tells us the ID of our job (3 in this example): | |
− | |||
− | + | Submitted batch job 3 | |
− | + | The options used in this example are specified inside the script using the ''#SBATCH'' directive. Any option can be specified either in the script or as a command line parameter (see ''man sbatch'' for details). | |
− | + | We can specify custom arguments '''before''' the name of the script: | |
− | |||
− | |||
− | + | sbatch --export=ARG1='firstArg',ARG2='secondArg' job_script.sh | |
− | + | These can be accessed in the job script as <code>$ARG1</code> and <code>$ARG2</code>. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 16:06, 19 March 2024
Contents
Welcome to AIC
AIC (Artificial Intelligence Cluster) is a computational grid with sufficient computational capacity for research in the field of deep learning using both CPU and GPU. It was built on top of SLURM scheduling system. MFF students of Bc. and Mgr. degrees can use it to run their experiments and learn the proper ways of grid computing in the process.
Access
AIC is dedicated to UFAL students who will get an account if requested by authorized lector.
To change the password, this link is available: https://aic.ufal.mff.cuni.cz/pw-manager
There is a restriction on resources allocated by one user in group students at a given time. By default, this is set to a maximum of 4 CPU and 1 GPU.
Jupyterlab
AIC provides also Jupyterlab portal on top of your AIC account and HOME directory. It can be found at https://aic.ufal.mff.cuni.cz/jlab . Pre-installed extensions: R, ipython, Rstudio (community), Slurm Queue Manager.
Connecting to the Cluster (directly)
Use SSH to connect to the cluster:
ssh LOGIN@aic.ufal.mff.cuni.cz
Basic HOWTO
Following HOWTO is meant to provide only a simplified overview of the cluster usage. It is strongly recommended to read some further documentation (CPU or GPU) before running some serious experiments. More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your quota is not exceeded.
Rule 0: NEVER RUN JOBS DIRECTLY ON aic.ufal.mff.cuni.cz HEADNODE. Use srun
to get computational node shell!
Suppose we want to run some computations described by a script called job_script.sh
:
#!/bin/bash #SBATCH -J helloWorld # name of job #SBATCH -p cpu # name of partition or queue (default is cpu) #SBATCH -o helloWorld.out # name of output file for this submission script #SBATCH -e helloWorld.err # name of error file for this submission script # run my job (some executable) sleep 5 echo "Hello I am running on cluster!"
We need to submit the job to the cluster which is done by logging on the submit host aic.ufal.mff.cuni.cz
and issuing the command:
sbatch job_script.sh
This will enqueue our job to the default partition (or queue) which is cpu
. The scheduler decides which particular machine in the specified queue has resources needed to run the job. Typically we will see a message which tells us the ID of our job (3 in this example):
Submitted batch job 3
The options used in this example are specified inside the script using the #SBATCH directive. Any option can be specified either in the script or as a command line parameter (see man sbatch for details).
We can specify custom arguments before the name of the script:
sbatch --export=ARG1='firstArg',ARG2='secondArg' job_script.sh
These can be accessed in the job script as $ARG1
and $ARG2
.