SLURM

The batch scheduler which runs on ALICE is SLURM (Simple Linux Utility for Resource Management.)

Useful commands

Click on each of the following commands to see more detailed information further down this page.

sacct: show accounting information for running or complete jobs.
salloc: Allocate resources for a job in realtime - usually used to allocate resources which can then be used to run tasks using srun
sbatch: Used to submit a job script for execution
scancel: Used to cancel a queued or running job
sinfo: Show information on the state of nodes and partitions managed by slurm
squeue: Show information on the state of jobs
srun: Used to run a task under slurm control from within a job allocation.

Getting job statistics - sacct

The sacct command is used to get statistics on your running or completed jobs.

Useful options:

-S starttime: The default is to show jobs from running since the start of the day. Use stime to show jobs for a longer time period, for example, to show jobs run within the last 7 days, 4 weeks and since 1st August 2023 :

sacct -S now-7days
sacct -S now-4weeks
sacct -S 2023-08-01

-j JOB_ID: report information on a particular job.

sacct has many more options for selecting jobs and formatting the output, more information is available in the SLURM online documentation

For example, 'sacct' with the following options may be useful to see what resources were requested for and used by a job:

sacct -j JOB_ID -o User,JobID,Jobname,state,time,elapsed,start,end,ReqMem,MaxRss,MaxVMSize,nnodes,ncpus

Much of the information provided by sacct is also available in an easilly readable form in the job completion email which you can request the scheduler send when your job finishes.

Allocate resources for interactive use - salloc

salloc is used to submit a request for an interactive job.

For example - to request 4 cpu cores and 8GB memory on one node in the devel partition for 1 hour:

salloc --partition=devel --nodes=1 --ntasks-per-node=1 \
       --cpus-per-task=4 --time=1:0:0 --mem=8g

After entering the command salloc will block, then return your job id once the allocation is ready. Once your allocation is ready you can run tasks in your allocation using srun (or mpirun for mpi tasks).

See the section on interactive jobs for more information.

Note also that salloc takes many of the same command line parameters as sbatch and srun.

SLURM salloc documentation

Submit a job for batch execution - sbatch

The sbatch command is used to submit job scripts for batch execution.

sbatch JOB_SCRIPT

sbatch can accept the same command line options as salloc and to a lesser extent srun, however it is more usual to put these options into the JOB_SCRIPT as scheduler directives (lines beginning #SBATCH).

Much more information is provided on writing and submitting job scripts in the simple batch job, MPI and Hybrid Job and Accessing GPUs sections.

SLURM sbatch documentation

Cancel a job - scancel

To cancel a pending or running job, use scancel with the job id:

scancel JOB_ID

Note there are many other ways of specifying the job you wish to cancel (though you can only cancel your own jobs). See the SLURM documentation

Show information on running jobs - squeue

To show all jobs in the scheduler enter:

squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            453375     devel PostgreS    sj239  R       4:23      1 alice-node001
            453394     devel     bash     dpm9  R       0:04      1 alice-node001
            409837       gpu  GPUido1   ab1207 PD       0:00      1 (PartitionTimeLimit)
            416644       gpu  mapping    ma873  R 1-18:31:57      1 alice-gpu01
            446186       gpu     bash    hz204  R 1-18:03:35      1 alice-gpu02
        ...

By default, this shows:

Field	Description
JOBID	Job ID of this job
PARTITION	Partition (queue) that the job is assigned to
NAME	Job name
USER	User that owns the job
ST	Job State (R - Running, PD - Pending)
TIME	time the job has run for
NODELIST(REASON)	list of nodes the job is running on, or the reason the job is not yet running.

If your job is running the NODELIST(REASON) field will show the list of nodes assigned to run it. Otherwise, the reason your job is not running will be shown:

Reasons why a job is not running

(Dependency): The job is waiting for a dependancy to be satisfied

(Job's QOS not permitted to use this partition)
(Job's QOS not permitted to use this partition (parallel allows hpc not normal))
(Job's QOS not permitted to use this partition (gpu allows hpc not normal))
(Job's QOS not permitted to use this partition (long allows hpc not normal))
(Job's QOS not permitted to use this partition (lmem allows hpc not normal)): Your job has been directed to a specialist queue (parallel, gpu, long or lmem), but you are not a member of an HPC project so cannot run jobs in those partitions. More information on using HPC projects

(Nodes required for job are DOWN DRAINED or reserved for jobs in higher priority partitions): Nodes are currently unavailable for the job, but will be released in the future

(Priority): The job is waiting for resources

(PartitionNodeLimit): The job requires more nodes than are permitted in this partition, it will never run. Cancel the job and resubmit to a different queue. You may need to submit to the parallel queue.

(PartitionTimeLimit): The time requested for the job exceeds the current time limit for the requested partition.

(QOSMaxJobsPerUserLimit): You have reached the maximum number of jobs that you can run simultaneously. This job will be scheduled once some of your other jobs are completed.

A full list of Job Reason Codes can be seen in the manual information page for squeue:

man squeue

If a reason not listed above is seen and man squeue is not helpful , please open a support request quoting the job number(s) and reason message(s) seen.

PartitionTimeLimit

It is often useful to only show your own jobs:

squeue --me

To show jobs just in a particular partition - for example to see jobs running and queued in the gpu partition:

squeue -p gpu

To see all jobs currently running:

squeue --state=RUNNING

or:

squeue --state=R

To see all jobs currently queued, but not running:

squeue --state=PENDING

or:

squeue --state=PD

There are states for jobs other than 'RUNNING' and 'PENDING', these can be listed by looking at the manual entry for 'squeue': man squeue

To see more information about jobs add the -l flag, for example:

squeue --me -l

For information about specific job or jobs:

squeue -j 12345,12444

When a job is scheduled to start (this is an estimate, not a guarantee):

squeue -j JOB_ID --start

run a job step - srun.

More information on srun is provided for MPI jobs where it is used to launch the mpi tasks and interactive jobs where it is used to allocate and launch simple interactive tasks or to run more complex mpi tasks after resources are allocated for interactive use.