Jupyter Notebooks

Here is a not-so-short guide to start using Jupyter Notebooks on the HPC (High Performance Computer).

An up-to-date list of clusters can be found here: https://www.ugent.be/hpc/en/infrastructure

Notice: at this moment there is only 1 cluster with GPU computing joltik. You don’t need an invitation anymore from hpc@ugent.be.

A complete guide to the HPC can be found here https://hpcugent.github.io/vsc_user_docs/pdf/intro-HPC-linux-gent.pdf

Make sure you have a HPC account

Goto https://account.vscentrum.be/ and upload your RSA key (it should be in ~/.ssh). See SSH key if you have no ssh RSA key.

Wait until you get a confirmation e-mail. (more info @ https://hpc.ugent.be/userwiki/index.php/User:VscRequests)

Check if you can login into the HPC

telin$ ssh {vsc_account_number}@login.hpc.ugent.be
hpc$ exit

replace with the {vsc_account_number} you have been appointed eg. vsc40053@login.hpc.ugent.be.

Transfer your code to the HPC

telin$ scp -r {name_of_directory} {vsc_account_number}@login.hpc.ugent.be:

Check if any of the GPU’s is available on the cluster for working interactivly

There is 1 clusters in Tier 2 with GPU’s in the UGent HPC as of this writing joltik. Select it with module swap cluster command. The Tier 1 cluster has a GPU cluster (Hortence) but you should submit a proposal to apply for access. See https://www.vscentrum.be/compute.

telin$ ssh {vsc_account_number}@login.hpc.ugent.be
hpc$ module swap cluster/joltik
hpc$ pbsmon
 3300 3301 3302 3303 3304
    J    J    X    J    j

 3305 3306 3307 3308 3309
    R    J    R    J    _

   _ free                 : 1   |   X down                 : 1   |
   j partial              : 1   |   x down_on_error        : 0   |
   J full                 : 5   |   m maintenance          : 0   |
                                |   . offline              : 0   |
                                |   o other (R, *, ...)    : 2   |

As you can see some nodes are occupied (J=full), some have some GPU’s free (j=partial), some are getting ready or are going to maintaince mode (down) and some can be _=free. If there is not 1 node which is free or mixed you will have to wait and you cannot work interactivly right away.

Start an interactive GPU session on the HPC

hpc$ qsub  -I -l nodes=1:gpus=1

This tells the HPC queing system that you want to work interactivly (-I), you want 1 node and 1 GPU. If you have success you will see another hpc prompt:

salloc: Granted job allocation 40002722
salloc: Waiting for resource configuration
salloc: Nodes node3302.joltik.os are ready for job
{vsc_account_number}@node3302:~$

List available modules

{vsc_account_number}@node3302:~$ module avail

The once we are interested in is CUDA and PyTorch (TensorFlow is also available)

   CUDA/10.1.105-GCC-8.2.0-2.31.1
   CUDA/10.1.168
   CUDA/10.1.243                                         (D)
   PyTorch/1.2.0-fosscuda-2019.08-Python-3.7.2

These are the packages which are available at the time of writing.

Load the desired modules

{vsc_account_number}@node3302:~$ module load CUDA/10.1.243
{vsc_account_number}@node3302:~$ module load PyTorch/1.6.0-foss-2019b-Python-3.7.4

You can check if cuda is working:

{vsc_account_number}@node3302:~$ nvidia-smi
{vsc_account_number}@node3302:~$ /apps/gent/CO7/cascadelake-volta-ib-PILOT/software/CUDA/10.1.243/extras/demo_suite/deviceQuery
{vsc_account_number}@node3302:~$ echo $PATH
/apps/gent/CO7/cascadelake-volta-ib-PILOT/software/CUDA/10.1.243:/apps/gent/CO7/cascadelake-volta-ib-PILOT/software/CUDA/10.1.243/nvvm/bin:/apps/gent/CO7/cascadelake-volta-ib-PILOT/software/CUDA/10.1.243/bin:/user/gent/400/vsc40053/.local/bin:/usr/libexec/slurm/wrapper:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin
{vsc_account_number}@node3302:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
{vsc_account_number}@node3302:~$ python --version
Python 3.7.4

Install Jupyter Notebook

See the reference link https://jupyter.readthedocs.io/en/latest/install.html. You only have to do this once!

{vsc_account_number}@node3302:~$ pip3 install --user jupyter

This will create a private enviroment in your ~/.local directory. Add this PATH to your session and profile:

{vsc_account_number}@node3302:~$ export PATH=~/.local/bin:$PATH
{vsc_account_number}@node3302:~$ echo -e '\nexport PATH=~/.local/bin:$PATH' >>.bashrc

You can check with:

{vsc_account_number}@node3302:~$ jupyter --version
jupyter core     : 4.6.1
jupyter-notebook : 6.0.2
qtconsole        : 4.5.5
ipython          : 7.9.0
ipykernel        : 5.1.3
jupyter client   : 5.3.4
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 4.4.0
traitlets        : 4.3.3

If you get an error, try the upgrade:

{vsc_account_number}@node3302:~$ jupyter --version
/apps/gent/CO7/cascadelake-volta-ib-PILOT/software/Python/3.7.2-GCCcore-8.2.0/bin/python3.7: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
{vsc_account_number}@node3302:~$ pip3 install --upgrade  --user jupyter

Run Jupyter notebook

{vsc_account_number}@node3302:~$ jupyter notebook --ip=`ifconfig ib0|awk '/inet / {print $2}'`

You will get a link to something like this: http://10.143.8.3:8888/tree?token=a46bed6fe6bfab325be1322067e977ebb90839df2eb314ea

As this link points to the local HPC GPU machine it can only be accessed with an extra portforward. We can enable this, so you can connect your local browser with the link provided of the jupyter notebook. But first open a new ssh connection to the HPC login node and enable the portforward. Use the same IP and port as the jupyter notebook link:

telin$ ssh -L 8888:10.143.8.3:8888 {vsc_account_number}@login.hpc.ugent.be

and you can open your local browser with this URL: http://127.0.0.1:8888/tree?token=a46bed6fe6bfab325be1322067e977ebb90839df2eb314ea

Congrats: This will make a forward tunnel and now you can point your local browser to the Jupyter 127.0.0.1 link!