Running interactive jobs

Interactive jobs can be run on ALICE using the salloc and srun commands. All job queues are available for both batch and interactive use, however we discourage running long (more than a few day) jobs interactively as any interuption to the initiating process (usually running on a login node) will cause the job to fail.

Serial or shared memory use

Accessing a single node for interactive use can be achieved with the srun command - the following example will request a job allocation, then when the resources are available, start an interactive shell (bash) on the allocated node:

[cm610@alice-login01 ~]$ srun --partition=devel --account=MY_HPC_PROJECT \
                              --cpus-per-task=4 --time=2:0:0 --mem=4g \
                              --pty /bin/bash

[cm610@alice-node003 ~]$ 

Note, if you are not a member of an hpc project, you should delete --account=MY_HPC_PROJECT from the above command, otherwise, replace MY_HPC_PROJECT with the name of the hpc project you are using.

The srun command will block when run until resources are available - once available the command will return a shell prompt and you are able to run commands (note above - the srun command was run on alice-login01, the shell prompt returned when the job begins is for alice-node003 where any commands will be run).

Once your job is complete simply exit your shell using the exit command and your job allocation will end.

When used in this way srun takes the same parameters as sbatch (or salloc).

--partition=devel

Specify that we wish to run in the devel partition - please read the partition section for more notes on available cluster partitions. Also note that in most cases you do not need to specify the partition, the scheduler will route your job to a suitable partition.

--account=MY_HPC_PROJECT

The hpc project you are submitting the job against - if you do not specify a project your job will go to the default project which only has access to devel, short and medium queues and recieves lower priority.

--cpus-per-task=4

Request 4 cpu cores be assigned for this job.

--time=2:0:0

Request 2 hours of runtime for the job

--mem=4g

Request 4 gigabytes of memory per node

--pty /bin/bash

Ask the scheduler to run a bash shell when the resources are allocated - note that you can run any command you wish here. The --pty is required for bash as it requires the allocation of a psuedo-terminal to run. For most programs (anything that is not a shell) you should not need this.

GUI applications

If you are able to run gui applications on a login node (for example, you have logged in using nomachine or ssh with x-forwarding enabled) you can run gui applications on compute nodes as interactive jobs. Enabling this requires the use of the --x11 parameter when calling srun

For example:

srun --partition=devel --account=test --partition=devel --x11 \
                       --cpus-per-task=4 --time=2:0:0 --mem=4g \
                       --pty /bin/bash

This will start a shell on the allocated resources with X forwarding enabled - your gui code can then be started.

It is also possible to run the gui directly - for instance:

srun --partition=devel --account=test --partition=devel --x11 \
                       --cpus-per-task=4 --time=2:0:0 --mem=4g \
                       xterm

Will start an x terminal on the remote node once resources are allocated. The job will end when the program started by srun exits.

Starting mpi jobs interactively

Starting an mpi job in an interactive session is more complicated than the above examples as you cannot start a sub-task (using srun) from within srun. This means that the job allocation and start of your code needs to be seperated out.

To allocate resources for an interactive job, you can use salloc - this will allocate the resources for your job then exit, leaving the job running. Once the job is running, use srun to start subtasks (such as mpi processes).

Allocating tasks

salloc --partition=devel --account=MY_HPC_PROJECT --partition=devel \
                       --nodes=2 --tasks-per-node=16 --time=2:0:0 --mem-per-task=2g

This will block until your allocation is ready, then return the job id and exit. If you run srun (or mpirun) from the shell after salloc has returned, tasks will be started in this allocation (and on the nodes allocated for your job).

So running an (openmpi) code:

srun --mpi=pmix ./my_mpi_code

Or to run a oneapi intelmpi code:

export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so
srun --mpi=pmi2 ./my_oneapi_mpi_code

You can also run none-mpi tasks in the allocation using srun, for example to return the hostname of the first node allocated to your job:

srun hostname

Once you are done with your allocation the exit command will end the job.

Attaching to a running batch job

It is sometimes necessary to log on to a node running a batch job so you can check the status of the job processes via top or nvidia-smi (for gpu jobs). You cannot directly log in to compute nodes from a login node, however, you can attach a shell to a running job, which will give you interactive access to the nodes running that job. For example, to connect to a job JOBID:

srun --overlap --jobid JOBID --pty /bin/bash

This will give you a terminal running on the node runnind JOBID. Please note it is not possible to run XWindows programs via this. The interactive terminal will be terminated when the job to which it is attached ends.