Here is a not-so-short guide to start using Jupyter Notebooks on the HPC (High Performance Computer).
An up-to-date list of clusters can be found here: https://www.ugent.be/hpc/en/infrastructure
Notice: at this moment there are 2 clusters with GPU computing: joltik and accelgor.
A complete guide to the HPC can be found here https://docs.hpc.ugent.be/Linux/linux-tutorial/
Goto https://account.vscentrum.be/ and upload your RSA key (it should be in ~/.ssh). See SSH key if you have no ssh RSA key.
Wait until you get a confirmation e-mail. (more info @ https://www.ugent.be/hpc/en/access/policy/access)
telin$ ssh {vsc_account_number}@login.hpc.ugent.be
hpc$ exit
replace with the {vsc_account_number} you have been appointed eg. vsc40053@login.hpc.ugent.be.
telin$ scp -r {name_of_directory} {vsc_account_number}@login.hpc.ugent.be:
Select the joltik or accelgor with module swap cluster command. The Tier 1 cluster has a GPU cluster too (Hortence) but you should submit a proposal to apply for access! See https://www.vscentrum.be/compute.
telin$ ssh {vsc_account_number}@login.hpc.ugent.be
hpc$ module swap cluster/joltik
hpc$ pbsmon
3300 3301 3302 3303 3304
J J X J j
3305 3306 3307 3308 3309
R J R J _
_ free : 1 | X down : 1 |
j partial : 1 | x down_on_error : 0 |
J full : 5 | m maintenance : 0 |
| . offline : 0 |
| o other (R, *, ...) : 2 |
As you can see some nodes are occupied (J=full), some have some GPU’s free (j=partial), some are getting ready or are going to maintaince mode (down) and some can be _=free. If there is not 1 node which is free or mixed you will have to wait and you cannot work interactivly right away. It will be queued until 1 node will be available again and other jobs are drained.
Make sure you select the cluster joltik or accelgor first.
hpc$ qsub -I -l nodes=1:gpus=1
This tells the HPC queing system that you want to work interactivly (-I), you want 1 node and 1 GPU. If you have success you will see another hpc prompt:
salloc: Granted job allocation 40002722
salloc: Waiting for resource configuration
salloc: Nodes node3302.joltik.os are ready for job
{vsc_account_number}@node3302:~$
{vsc_account_number}@node3302:~$ module avail
The once we are interested in is CUDA and PyTorch (TensorFlow is also available)
CUDA/11.1.1-GCC-10.2.0
CUDA/11.3.1
CUDA/11.4.1 (D)
PyTorch/1.7.1-fosscuda-2020b
PyTorch/1.8.1-fosscuda-2020b
PyTorch/1.9.0-fosscuda-2020b
PyTorch/1.10.0-foss-2021a-CUDA-11.3.1
PyTorch/1.10.0-foss-2021a
PyTorch/1.10.0-fosscuda-2020b (D)
These are the packages which are available at the time of writing.
{vsc_account_number}@node3302:~$ module load PyTorch/1.10.0-foss-2021a-CUDA-11.3.1
You can check if cuda and pytorch is loaded:
{vsc_account_number}@node3302:~$ nvidia-smi
{vsc_account_number}@node3302:~$ nvcc --version
{vsc_account_number}@node3302:~$ python --version
See the reference link https://jupyter.readthedocs.io/en/latest/install.html. You only have to do this once!
{vsc_account_number}@node3302:~$ pip3 install --upgrade --user pip
{vsc_account_number}@node3302:~$ pip3 install --user jupyter
This will create a private enviroment in your ~/.local directory. Add this PATH to your session and profile if you haven’t done so:
{vsc_account_number}@node3302:~$ export PATH=~/.local/bin:$PATH
{vsc_account_number}@node3302:~$ echo -e '\nexport PATH=~/.local/bin:$PATH' >>.bashrc
You can check with:
{vsc_account_number}@node3302:~$ jupyter --version
If you get an error, try the upgrade:
{vsc_account_number}@node3302:~$ jupyter --version
/apps/gent/CO7/cascadelake-volta-ib-PILOT/software/Python/3.7.2-GCCcore-8.2.0/bin/python3.7: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
{vsc_account_number}@node3302:~$ pip3 install --upgrade --user jupyter
{vsc_account_number}@node3302:~$ jupyter notebook --ip=`ifconfig ib0|awk '/inet / {print $2}'`
You will get a link to something like this: http://10.143.8.3:8888/tree?token=a46bed6fe6bfab325be1322067e977ebb90839df2eb314ea
As this link points to the local HPC GPU machine it can only be accessed with an extra portforward. We can enable this, so you can connect your local browser with the link provided of the jupyter notebook. But first open a new ssh connection to the HPC login node and enable the portforward. Use the same IP and port as the jupyter notebook link:
telin$ ssh -L 8888:10.143.8.3:8888 {vsc_account_number}@login.hpc.ugent.be
and you can open your local browser with this URL by replacing the 10.143.x.x address by 127.0.0.1 and the same token: http://127.0.0.1:8888/tree?token=a46bed6fe6bfab325be1322067e977ebb90839df2eb314ea
Congrats: This will make a forward tunnel and now you can point your local browser to the Jupyter 127.0.0.1 link!