Partitions
The ALICE cluster is divided into a number of logical partitions which support different types of jobs. These are the equivalent of seperate scheduling queues in other schedulers.
In most cases, if a partition is not specified, jobs will be routed to the correct partition by the scheduler, based on the resources they request. Some jobs (parallel and devel jobs) cannot be easilly identified based on their resource requests so must explicitly specify the partition they require.
Also note that some partitions are only available to members of HPC projects. If you require access to these partitions, you must join or create an HPC project.
The partitions available on ALICE are listed below.
standard jobs - short, medium, long
These partitions support small (1 node or less) serial or shared memory jobs. These are jobs that require 1-64 cores and up to 240GB memory.
The short and medium partitions support job runtimes of up to 24 hours (short), 7 days (medium) up to 21 days (long). There is usually no need to explicitly request these partitions, the scheduler will automatically route suitable jobs to the correct partition.
partition | cores | nodes | memory | max runtime | availability |
---|---|---|---|---|---|
short | 1-64 | 1 | up to 240GB | 24 hours | all users |
medium | 1-64 | 1 | up to 240GB | 7 days | all users |
long | 1-64 | 1 | up to 240GB | 21 days | HPC users only |
development jobs - the devel partition
The devel partition is intended for development and testing of new code and other cases which require rapid scheduling of small jobs - 2 nodes are reserved for this partition, though jobs can run anywhere that resources are available.
Development jobs are limited to a maximum of 2 hours walltime and a maximum of 4 nodes (256 cores). They can run on high memory nodes (if the memory request is greater than 240GB per node), however turn around time on high memory nodes is likely to be much longer than for standard nodes.
The partition is available to all users, however a user may only have a single job running in this partition at any time, and maximum of 4 jobs can be submitted to this partition at a time.
partition | cores | nodes | memory | max runtime | availability |
---|---|---|---|---|---|
devel | 1-64 per node | 1-4 | up to 2TB per node <240GB per node recommended |
2 hours | all users |
The devel partition must be explicitly requested in your job script using:
#SBATCH --partition=devel
or by passing --partition=devel
as a command line parameter to sbatch, salloc or srun.
multi-node jobs - the parallel partition
The parallel partition is intended for jobs which require 1 or more entire compute nodes. Jobs in this partition should in most cases allocate and use entire nodes. Around half the standard compute nodes in the cluster are reserved for the use of parallel jobs of this type.
This partition is available to HPC project users only. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.
More detailed information on using this partition is provided on the MPI and hybrid Jobs page.
partition | cores | nodes | memory | max runtime | availability |
---|---|---|---|---|---|
parallel | 64 per node | 1-16 | 256GB per node | 7 days | HPC users only |
The parallel partition must be explicitly requested in your job script, you should also request all memory available on the node (using --mem=0):
#SBATCH --partition=parallel
#SBATCH --mem=0
These parameters can also be passed to sbatch, salloc or srun on the command line by adding the parameters --partition=parallel --mem=0
Large memory jobs - the lmem partition
The lmem partition is provided for jobs that require more than 256 GB per node. The lmem nodes provide 2 TB of memory each. In most cases, you do not need to explicitly request this queue. Jobs which request more memory per node than the standard compute nodes provide will automatically be routed to this partition.
Jobs which require few cpus but a lot of memory may be better suited to running as lmem jobs even if the total memory they require is less that 256GB. If your job requires more than 16GB of memory per CPU please consider submitting it directly to the lmem partition. This is particularly important if you are submitting large numbers of these jobs to the cluster as otherwise they can block access to cpu resources that could be used by others.
The lmem partition is only available to members of registered HPC projects. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.
partition | cores | nodes | memory | max runtime | availability |
---|---|---|---|---|---|
lmem | up to 64 | 1 | up to 2 TB | 7 days | HPC users only |
GPU jobs - the GPU partition
The gpu partition provides access to the gpu nodes. 4 gpu nodes are available - each provide 64 intel icelake cpu cores, two A100-80 nvidia gpgpus and 512GB memory.
Any job which requires gpu resources will be automatically routed to the gpu queue. Up to 4 GPUs may be in use by a single user at any one time.
The gpu queue is restricted to members of HPC projects. To access it, an HPC project which you are a member of must be specified when requesting resources for a job.
partition | cpu cores | gpus | nodes | memory | max runtime | availability |
---|---|---|---|---|---|---|
gpu | up to 64 per node |
up to 2 per node |
1-2 | up to 512gb per node |
7 days | HPC users only |
More information on running GPU jobs can be found in the Accessing GPUs section.
GPU devel jobs
For short test and development jobs a gpu-devel partition is provided. Users are limited to a single gpu-devel job, on a single node.
The gpu-devel partition must be explicitly requested using:
#SBATCH --partition=gpu-devel
or by passing --partition=gpu-devel
as a command line parameter to sbatch, salloc or srun.
The gpu-devel queue is only available to members of an HPC project
partition | cpu cores | gpus | nodes | memory | max runtime | availability |
---|---|---|---|---|---|---|
gpu-devel | up to 64 per node |
up to 2 per node |
1 | up to 512gb per node |
2 hours | HPC users only |