SLURM
The batch scheduler which runs on ALICE is SLURM (Simple Linux Utility for Resource Management.)
Useful commands
sacct
-
show accounting information for running or complete jobs.
salloc
-
Allocate resources for a job in realtime - usually used to allocate resources which can then be used to run tasks using srun
sbatch
-
Used to submit a job script for execution
scancel
-
Used to cancel a queued or running job
sinfo
-
Show information on the state of nodes and partitions managed by slurm
squeue
-
Show information on the state of jobs
srun
-
Used to run a task under slurm control from within a job allocation.
Getting job statistics - sacct
The sacct command is used to get statistics on your running or completed jobs.
Useful options:
-S starttime
-
The default is to show jobs from running since the start of the day. Use stime to show jobs for a longer time period, for example, to show jobs run within the last 7 days, 4 weeks and since 1st August 2023 :
sacct -S now-7days
sacct -S now-4weeks
sacct -S 2023-08-01
-j JOB_ID
-
report information on a particular job.
sacct has many more options for selecting jobs and formatting the output, more information is available in the SLURM online documentation
For example, 'sacct' with the following options may be useful to see what resources were requested for and used by a job:
sacct -j JOB_ID -o User,JobID,Jobname,state,time,elapsed,start,end,ReqMem,MaxRss,MaxVMSize,nnodes,ncpus
Much of the information provided by sacct is also available in an easilly readable form in the job completion email which you can request the scheduler send when your job finishes.
Allocate resources for interactive use - salloc
salloc is used to submit a request for an interactive job.
For example - to request 4 cpu cores and 8GB memory on one node in the devel partition for 1 hour:
salloc --partition=devel --nodes=1 --ntasks-per-node=1 \
--cpus-per-task=4 --time=1:0:0 --mem=8g
After entering the command salloc will block, then return your job id once the allocation is ready. Once your allocation is ready you can run tasks in your allocation using srun (or mpirun for mpi tasks).
See the section on interactive jobs for more information.
Note also that salloc takes many of the same command line parameters as sbatch and srun.
Submit a job for batch execution - sbatch
The sbatch command is used to submit job scripts for batch execution.
sbatch JOB_SCRIPT
sbatch can accept the same command line options as salloc and to a lesser extent srun, however it is more usual to put these options into the JOB_SCRIPT as scheduler directives (lines beginning #SBATCH).
Much more information is provided on writing and submitting job scripts in the simple batch job, MPI and Hybrid Job and Accessing GPUs sections.
Cancel a job - scancel
To cancel a pending or running job, use scancel with the job id:
scancel JOB_ID
Note there are many other ways of specifying the job you wish to cancel (though you can only cancel your own jobs). See the SLURM documentation
Show information on running jobs - squeue
To show all jobs in the scheduler enter:
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
453375 devel PostgreS sj239 R 4:23 1 alice-node001
453394 devel bash dpm9 R 0:04 1 alice-node001
409837 gpu GPUido1 ab1207 PD 0:00 1 (PartitionTimeLimit)
416644 gpu mapping ma873 R 1-18:31:57 1 alice-gpu01
446186 gpu bash hz204 R 1-18:03:35 1 alice-gpu02
...
By default, this shows:
Field | Description |
---|---|
JOBID | Job ID of this job |
PARTITION | Partition (queue) that the job is assigned to |
NAME | Job name |
USER | User that owns the job |
ST | Job State (R - Running, PD - Pending) |
TIME | time the job has run for |
NODELIST(REASON) | list of nodes the job is running on, or the reason the job is not yet running. |
If your job is running the NODELIST(REASON) field will show the list of nodes assigned to run it. Otherwise, the reason your job is not running will be shown:
Reasons why a job is not running
(Dependency)
- The job is waiting for a dependancy to be satisfied
(Job's QOS not permitted to use this partition)
(Job's QOS not permitted to use this partition (parallel allows hpc not normal))
(Job's QOS not permitted to use this partition (gpu allows hpc not normal))
(Job's QOS not permitted to use this partition (long allows hpc not normal))
(Job's QOS not permitted to use this partition (lmem allows hpc not normal))
- Your job has been directed to a specialist queue (parallel, gpu, long or lmem), but you are not a member of an HPC project so cannot run jobs in those partitions. More information on using HPC projects
(Nodes required for job are DOWN DRAINED or reserved for jobs in higher priority partitions)
- Nodes are currently unavailable for the job, but will be released in the future
(Priority)
- The job is waiting for resources
(PartitionNodeLimit)
- The job requires more nodes than are permitted in this partition, it will never run. Cancel the job and resubmit to a different queue. You may need to submit to the parallel queue.
(QOSMaxJobsPerUserLimit)
- You have reached the maximum number of jobs that you can run simultaneously. This job will be scheduled once some of your other jobs are completed.
If you see a reason not listed above, please open a support request via rcs.support@le.ac.uk.
It is often useful to only show your own jobs:
squeue --me
To show jobs just in a particular partition - for example to see jobs running and queued in the gpu partition:
squeue -p gpu
To see all jobs currently running:
squeue --state=RUNNING
or:
squeue --state=R
To see all jobs currently queued, but not running:
squeue --state=PENDING
or:
squeue --state=PD
There are states for jobs other than 'RUNNING' and 'PENDING', these can be listed by looking at the manual entry for 'squeue': man squeue
To see more information about jobs add the -l flag, for example:
squeue --me -l
For information about specific job or jobs:
squeue -j 12345,12444
When a job is scheduled to start (this is an estimate, not a guarantee):
squeue -j JOB_ID --start
run a job step - srun.
More information on srun is provided for MPI jobs where it is used to launch the mpi tasks and interactive jobs where it is used to allocate and launch simple interactive tasks or to run more complex mpi tasks after resources are allocated for interactive use.