Difference between revisions of "Main Page"

From UFAL AIC
(Basic HOWTO)
(Basic HOWTO)
Line 19: Line 19:
 
More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your [[Quotas|quota]] is not exceeded.
 
More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your [[Quotas|quota]] is not exceeded.
  
'''Rule 0: NEVER RUN JOBS DIRECTLY ON aic.ufal.mff.cuni.cz HEADNODE. Use qrsh to get computational node shell!'''
+
'''Rule 0: NEVER RUN JOBS DIRECTLY ON aic.ufal.mff.cuni.cz HEADNODE. Use *srun* to get computational node shell!'''
  
 
Suppose we want to run some computations described by a script called <code>job_script.sh</code>:
 
Suppose we want to run some computations described by a script called <code>job_script.sh</code>:
  
 
  #!/bin/bash
 
  #!/bin/bash
  echo "This is just a test."
+
  #SBATCH -J helloWorld   # name of job
  echo "printing parameter1: $1"
+
#SBATCH -p cpu   # name of partition or queue (default is cpu)
  echo "prinitng parameter2: $2"
+
#SBATCH -o helloWorld.out   # name of output file for this submission script
 +
#SBATCH -e helloWorld.err   # name of error file for this submission script
 +
# run my job (some executable)
 +
  sleep 5
 +
  echo "Hello I am running on cluster!"
  
 +
We need to ''submit'' the job to the cluster which is done by logging on the submit host <code>aic.ufal.mff.cuni.cz</code> and issuing the command:<br>
 +
<code>sbatch job_script.sh</code>
  
We need to ''submit'' the job to the grid which is done by logging on the submit host <code>aic.ufal.mff.cuni.cz</code> and issuing the command:<br>
+
This will enqueue our ''job'' to the default ''partition'' (or ''queue'') which is <code>cpu</code>. The scheduler decides which particular machine in the specified queue has ''resources'' needed to run the job. Typically we will see a message which tells us the ID of our job (82 in this example):
<code>qsub -cwd -j y job_script.sh Hello World</code>
 
 
 
This will enqueue our ''job'' to the default ''queue'' which is <code>cpu.q@*</code>. The scheduler decides which particular machine in the specified queue has ''resources'' needed to run the job. Typically we will see a message which tells us the ID of our job (82 in this example):
 
  
 
  Your job 82 ("job_script.sh") has been submitted
 
  Your job 82 ("job_script.sh") has been submitted

Revision as of 16:10, 16 November 2022

CZ.02.2.69/0.0/0.0/17_044/0008562
Podpora rozvoje studijního prostředí na Univerzitě Karlově - VRR
OP VVV logo.jpg

Welcome to AIC

AIC (Artificial Intelligence Cluster) is a computational grid with sufficient computational capacity for research in the field of deep learning using both CPU and GPU. It was built on top of SGE scheduling system. MFF students of Bc. and Mgr. degrees can use it to run their experiments and learn the proper ways of grid computing in the process.

Access

AIC is dedicated to UFAL students who will get an account if requested by authorized lector.

Connecting to the Cluster.

Use SSH to connect to the cluster:

 ssh LOGIN@aic.ufal.mff.cuni.cz

Basic HOWTO

Following HOWTO is meant to provide only a simplified overview of the cluster usage. It is strongly recommended to read some further documentation (CPU or GPU) before running some serious experiments. More serious experiments tend to take more resources. In order to avoid unexpected failures please make sure your quota is not exceeded.

Rule 0: NEVER RUN JOBS DIRECTLY ON aic.ufal.mff.cuni.cz HEADNODE. Use *srun* to get computational node shell!

Suppose we want to run some computations described by a script called job_script.sh:

#!/bin/bash
#SBATCH -J helloWorld					  # name of job
#SBATCH -p cpu 					  # name of partition or queue (default is cpu)
#SBATCH -o helloWorld.out				  # name of output file for this submission script
#SBATCH -e helloWorld.err				  # name of error file for this submission script
# run my job (some executable)
sleep 5
echo "Hello I am running on cluster!"

We need to submit the job to the cluster which is done by logging on the submit host aic.ufal.mff.cuni.cz and issuing the command:
sbatch job_script.sh

This will enqueue our job to the default partition (or queue) which is cpu. The scheduler decides which particular machine in the specified queue has resources needed to run the job. Typically we will see a message which tells us the ID of our job (82 in this example):

Your job 82 ("job_script.sh") has been submitted

The basic options used in this example are:

  • -cwd - the script is executed in the current directory (the default is your $HOME)
  • -j y - stdout and stderr outputs are merged and redirected to a file (job_script.sh.o82)

We have specified two parameters Hello and World. The output of the script will be located in your $HOME directory after the script is executed. It will be merged with stderr and it should look like this:

AIC:ubuntu 18.04: SGE 8.1.9 configured...                                                                                              
This is just a test.
printing parameter1: Hello
prinitng parameter2: World
======= EPILOG: Tue Jun 4 12:41:07 CEST 2019
== Limits:   
== Usage:    cpu=00:00:00, mem=0.00000 GB s, io=0.00000 GB, vmem=N/A, maxvmem=N/A
== Duration: 00:00:00 (0 s)
== Server name: cpu-node13