Batch and interactive jobs

Computations can be run interactively or as a batch job.

Interactive jobs

Computations can be run interactively either directly on the login nodes or by submitting an interactive job request to the scheduler. Interactive jobs are good for cases where:

  • Computations are short and not memory or cpu intensive (for example plotting a simple graph from a small data file).
  • Tasks that require user interaction and cannot be left to run for long periods by themselves.
  • Tasks that require interaction via a gui.

There are per user limits applied to the login nodes:

  • maximum memory per user 24GB
  • maximum cpu time allocation per user 16 cores
  • max process cpu time 6 cpu hours

Resources on login nodes are shared between a number of users (typically 80+ during busy periods) which can also affect performance. Tasks that require interactive use can instead be run on a compute node using an interactive job.

Simple interactive job example

The following example requests an allocation on a compute node of 4 cores and 16GB memory for 1 hour and then starts a shell on the allocated node to do some processing.

$ salloc --partition=devel --nodes=1 --ntasks-per-node=1 \
                                --cpus-per-task=4 --time=1:0:0 --mem=16g 
salloc: Granted job allocation 243
 *********************************************
  You are in the following research projects:

         admin (ID 0000)

 *********************************************

 Default SCRATCHDIR is /scratch/admin/cm610
 Default TMPDIR is /local/cm610

 *********************************************

[cm610@alice-login01 ~]$ srun --pty bash -i
[cm610@alice-node001 ~]$ hostname        # note we are now running on a compute node
alice-node001
[cm610@alice-node001 ~]$ date            # run a few things
Tue Aug  1 07:16:56 PM BST 2023
[cm610@alice-node001 ~]$ exit            # done what we wanted to so exit the node       
exit
[cm610@alice-login01 ~]$ exit            # and exit our job allocation
exit
salloc: Relinquishing job allocation 243

More detail can be found in the interactive job section.

Batch Jobs

Most work on a high performance computing system like ALICE is run as batch jobs. These require more work to set up as they require the writing of a job script, which contains the commands to run your computation and directives to tell the scheduler what resources are required by the job (cpu cores, memory, GPUs, time etc.)

Creating and submitting a batch job script

Job scripts are regular shell scripts with additional directives which are read by the scheduler and which specify the resources (CPUs, nodes, GPUs, time, memory etc) which are required to run the software. A simple example script is provided below:

#!/bin/bash
#
# Example SLURM job script for ALICE3
# SLURM directives:
#SBATCH --job-name=test_1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=2G
#SBATCH --time=00:10:00
#SBATCH --export=NONE

# commands to run
module purge
module load gcc/12.3.0

# just list loaded modules, host we are running on and date
module list
hostname
date
sleep 60

Note the lines starting #SBATCH - these are the directives which give the scheduler information about the job you are running.

Once saved to a file this job script can be submitted to the scheduler:

$ sbatch test_job.slm 
Submitted batch job 244

You can view information on your running jobs using the command:

$ squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               244     short   test_1    cm610  R       0:05      1 alice-node003

Once the job is complete we can view the job output:

cat slurm-244.out 
$ cat slurm-244.out 

Currently Loaded Modules:
  1) gcc/12.3.0-yxgv2bl



alice-node003
Tue Aug  1 07:30:03 PM BST 2023

This shows the expected output from the commands in the script - my loaded modules, the node on which the job was run and the time it started.

More information on batch jobs can be found in the Simple Batch Jobs section.