Batch and interactive jobs
Computations can be run interactively or as a batch job.
Interactive jobs
Computations can be run interactively either directly on the login nodes or by submitting an interactive job request to the scheduler. Interactive jobs are good for cases where:
- Computations are short and not memory or cpu intensive (for example plotting a simple graph from a small data file).
- Tasks that require user interaction and cannot be left to run for long periods by themselves.
- Tasks that require interaction via a gui.
There are per user limits applied to the login nodes:
- maximum memory per user 24GB
- maximum cpu time allocation per user 16 cores
- max process cpu time 6 cpu hours
Resources on login nodes are shared between a number of users (typically 80+ during busy periods) which can also affect performance. Tasks that require interactive use can instead be run on a compute node using an interactive job.
Simple interactive job example
The following example requests an allocation on a compute node of 4 cores and 16GB memory for 1 hour and then starts a shell on the allocated node to do some processing.
$ salloc --partition=devel --nodes=1 --ntasks-per-node=1 \
--cpus-per-task=4 --time=1:0:0 --mem=16g
salloc: Granted job allocation 243
*********************************************
You are in the following research projects:
admin (ID 0000)
*********************************************
Default SCRATCHDIR is /scratch/admin/cm610
Default TMPDIR is /local/cm610
*********************************************
[cm610@alice-login01 ~]$ srun --pty bash -i
[cm610@alice-node001 ~]$ hostname # note we are now running on a compute node
alice-node001
[cm610@alice-node001 ~]$ date # run a few things
Tue Aug 1 07:16:56 PM BST 2023
[cm610@alice-node001 ~]$ exit # done what we wanted to so exit the node
exit
[cm610@alice-login01 ~]$ exit # and exit our job allocation
exit
salloc: Relinquishing job allocation 243
More detail can be found in the interactive job section.
Batch Jobs
Most work on a high performance computing system like ALICE is run as batch jobs. These require more work to set up as they require the writing of a job script, which contains the commands to run your computation and directives to tell the scheduler what resources are required by the job (cpu cores, memory, GPUs, time etc.)
Creating and submitting a batch job script
Job scripts are regular shell scripts with additional directives which are read by the scheduler and which specify the resources (CPUs, nodes, GPUs, time, memory etc) which are required to run the software. A simple example script is provided below:
#!/bin/bash
#
# Example SLURM job script for ALICE3
# SLURM directives:
#SBATCH --job-name=test_1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=2G
#SBATCH --time=00:10:00
#SBATCH --export=NONE
# commands to run
module purge
module load gcc/12.3.0
# just list loaded modules, host we are running on and date
module list
hostname
date
sleep 60
Note the lines starting #SBATCH - these are the directives which give the scheduler information about the job you are running.
Once saved to a file this job script can be submitted to the scheduler:
$ sbatch test_job.slm
Submitted batch job 244
You can view information on your running jobs using the command:
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
244 short test_1 cm610 R 0:05 1 alice-node003
Once the job is complete we can view the job output:
cat slurm-244.out
$ cat slurm-244.out
Currently Loaded Modules:
1) gcc/12.3.0-yxgv2bl
alice-node003
Tue Aug 1 07:30:03 PM BST 2023
This shows the expected output from the commands in the script - my loaded modules, the node on which the job was run and the time it started.
More information on batch jobs can be found in the Simple Batch Jobs section.